Raph Levien <raph@acm.org>
10 Jul 1999
See also: Gnome-Text API documentation.
levien.com Gnome homeThis document describes the Gnome-Font API. Depending on the application, you may be interested in implementing either side of this API. Obviously, implementing the client side is important for using fonts in the normal Gnome imaging environment. However, implementing the server side is useful in certain applications, such as using Gnome-Text separated from the rest of Gnome.
The basic GnomeFont functionality is in the GnomeFontBase Gtk+ class. This is an abstract class; all actual implementations of fonts derive from it. The class contains methods for accessing font metrics, but not for getting glyph shapes. Instead, the GnomeFontBase class contains a ::query_interface() method, which returns an object specialized for providing glyph shapes using a particular interface. A given font implementation may support any number of interfaces through this query_interface technique.
GnomeFontBase API
The GnomeFontBase API is as follows:::query_interface
GtkObject *gnome_font_base_query_interface (const char *interface_name);Given the name of an interface, return a Gtk+ object supporting that interface, or NULL if the interface is not supported. The namespace for interface names is managed informally. A number of interfaces are supported directly by GnomePrint, and a few more will be supported in the future, either by GnomePrint directly, or other font-intensive modules (such as Gill).
By convention, the interface name is the same as the type of the corresponding Gtk+ object. Thus, for Type 1 fonts, the interface name is "GnomeFontType1" and a variable pointing to this object may be declared: GnomeFontType1 *font_type1;
::get_glyph
int gnome_font_base_get_glyph (GnomeFontBase *font, guint32 unicode);Given a unicode value, return a glyph number, or -1 if the font does not cover the character. The management of glyph numbers is entirely up to the font; it has no separate meaning. Thus, one reasonable strategy is to allocate from zero to one less the number of glyphs in the font. Another is to use the Unicode value of the glyph.
Glyph numbers need not be consistent from invocation to invocation. Thus, they may be dynamically allocated as glyphs are needed. Users of GnomeFontBase should be careful to use glyph numbers only with the font they were obtained from.
Note: I'm really tempted to add a glyph alternate argument to this method, if for no other reason than to get access to small cap glyphs.
::glyph_width
int gnome_font_base_glyph_width (GnomeFontBase *font, int glyph);Given a glyph number, return its width in 0.001 em units. An em is equal to the size of the font. Thus, for a 12 point font, a value of 1000 corresponds to 12 points, while a value of 500 corresponds to 6 points.
::kern
int gnome_font_base_kern (GnomeFontBase *font, int glyph1, int glyph2);Given two glyph numbers, return the kerning adjustment, in units of .001 em. A negative value moves the glyphs closer, a positive value farther apart. Kerning makes the spacing between glyphs more regular, and greatly improves the quality of text composition. For example, in the word "Tomorrow" set in Adobe Utopia, the "To" kern pair is -100, and the "ow" kern pair is -30. Without kerning, the gaps between these two pairs of glyphs would be noticeably larger than the others, resulting in the visual appearance of inconsistent spacing.
::ligate
int gnome_font_base_ligate (GnomeFontBase *font, int glyph1, int glyph2);Given two glyph numbers, return the glyph number for the corresponding ligature, or -1 if the two glyphs do not ligate. For example, if glyph numbers correspond one-to-one with Unicodes, then ligating 0x0066 (LATIN SMALL LETTER F) and 0x0069 (LATIN SMALL LETTER I) may result in 0xFB01 (LATIN SMALL LIGATURE FI). For glyph numbers corresponding to Adobe Standard Encoding, the same example is: 0x66 (f) ligated with 0x69 (i) yields 0xAE (fi).
::glyph_bbox
We need a method that returns the bounding box of a glyph. I haven't worked out the exact signature yet.
Glyph shape interfaces
The glyph shape interfaces obtained from ::query_interface are used to obtain glyph shape data, needed by the renderer.
GnomeFontType1
One of the first interfaces to be implemented is GnomeFontType1, used for Adobe Type 1 format fonts. This interface is particularly useful for output in the PostScript page description language.
The following code sequence illustrates how to obtain an object implementing the GnomeFontType1 interface from an arbitrary GnomeFontBase object:
GnomeFontBase *font_base; GtkObject *obj; GnomeFontType1 *font_type1; obj = gnome_font_base_query_interface (font_base, "GnomeFontType1"); if (obj == NULL) { g_warning ("Font does not support GnomeFontType1 interface"); } else { font_type1 = GNOME_FONT_TYPE1 (obj); }The first cut at the type1 interface simply supplies a ::get_pfa() method, which returns a PFA representation of the font, suitable for embedding in a PostScript output stream. However, it is known that this interface may not be sufficient for dealing with all character encoding issues (the Adobe Standard Encoding does not give access to all glyphs in the font, thus the font will need to be re-encoded). Adobe managed, quite brilliantly, to stuff 260 glyphs in their Courier font, which means that to access all the glyphs, you have to go through some contortions.
The type1 interface also supplies a ::get_fontname() method, which returns the PostScript fontname of the font.
The results of both these methods are simple null-terminated C strings, to be freed with g_free().
GnomeFontBpath
This interface returns ArtBpath bezier path outlines. It is useful for displaying glyphs on the screen, and also for importing into vector applications. The primary method is ArtBpath * ::get_bpath (int glyph), which simply returns the glyph, scaled to 1000 units per em, and with the left baseline point as the coordinate origin, and positive y going upward (i.e. standard Adobe coordinates). In general, you'll want to scale the y coordinate negatively to bring the glyph into the libart standard coordinate space, which has positive y going downward.The Font catalog interfaces
The GnomeFontBase interface and the glyph rendering interfaces derived from it enable layout and rendering once a font is selected. The font catalog interfaces describe how to get a font, particularly how to select one from the set of fonts installed. Additionally, fonts from the same family but different weights and italic-ness should be grouped together. This is the function of the FontCatalog, FontFamily, and FontList interfaces, described in this section.
GnomeFontCatalog
The main method of GnomeFontCatalog is GnomeFontFamily * ::get_font_family (const char *family_name). For example, if the catalog contains a Times font family, then ::get_font_family ("Times") will return it. A NULL return value means that the font does not exist in the catalog.
GSList * ::list_fonts (gboolean *complete) returns a list of font family names in the catalog. This possibly might not be a complete list, if the catalog supports dynamically created or downloaded fonts. In this case, *complete is set to false.
GnomeFontFamily
The main method of GnomeFontFamily is GnomeFontBase * ::get_font (GnomeFontWeight *weight, gboolean italic). This searches for the font with the same italicness and closest weight to the *weight specified. On return, *weight is set to the actual weight chosen. In case of a tie, the lighter weight is chosen.
GnomeFontList
I think this will just be a GSList of GnomeFontFamily's. The idea is that it's a sequence of font families with different Unicode coverage. Thus, your typical text processor (i.e. GnomeText) will run down the FontList until it finds a font that matches the glyph. A typical font list might contain Times and Mincho. Thus, characters in Latin scripts get rendered with Times glyphs, and CJK characters get rendered with Mincho.
Tricky bits
The interface presented here is deceptively simple. Nonetheless, it should be possible to handle a large subset of Unicode scripts using nothing more, hiding some of the complexity behind the interface.
One of the major tricky bits is dealing with accented characters (i.e. characters with diacritical marks). For accented characters that fall within the Unicode encoding, there is a relatively easy solution: the text processing module passes that unicode to the ::get_glyph() method, and gets a glyph number. It is then the responsibility of the renderer to render that glyph number correctly.
When there is no Unicode for the character and its diacritical, a more subtle solution is necessary. The Unicode encoding is the character followed by the diacritical. The font implementation may return glyphs for these two characters, then ligate them together into a new glyph number. Because there is a combinatorial explosion in the number of glyphs that can be composed in this way, glyph numbers for these composite glyphs should in general be dynamically allocated.
A few examples should help illustrate this point:
Unicode U+00E4 is ä (a umlaut). The text processor may pass 0x00E4 to the ::get_glyph method. There is no corresponding glyph number in the Adobe StandardEncoding, but assume that the font has been reencoded to the ISOLatin1Encoding. Then, the glyph number is 0xe4. This can be passed directly to the PostScript show operator, without any further need for cooperation between the font implementation and the rendering back-end.
The canonical decomposition of U+00E4 (LATIN SMALL LETTER A WITH DIAERESIS) is U+0061 (LATIN SMALL LETTER A) + U+0308 (COMBINING DIAERESIS). Adobe Type1 fonts do not have a glyph for diacritical marks (such as combining diaeresis), but do contain spacing versions. The Adobe StandardEncoding for U+00A8 (DIAERESIS) is 0xc8 (dieresis). A reasonable solution would be to allocate a new glyph number for combining diaeresis (this can be statically allocated, as there are only a small number of them). Let's say this new glyph number is 0x308.
Now, ligating 0x61 (a) and 0x308 (our combining diaeresis) can yield the familiar glyph number for ä, 0xe4 in ISOLatin1Encoding. It should be possible to render this glyph without any special processing by the rendering back-end.
Here is another example. Suppose we want an o with a cedilla (like ç, but with an o instead of a c). The Unicode for this is U+006F (LATIN SMALL LETTER O) + U+0327 (COMBINING CEDILLA). There is no single Unicode for the combination. Let's say that we are using a Unicode-based glyph encoding, so we get back 0x6f and 0x327 as our glyph codes. One way to render this is for the ligate function to allocate a new glyph code for the combination, say 0x10000. The rendering back-end must have access to this dynamic allocation, but this is no big problem; that's why we have ::query_interface. The rendering back-end thus gets the information it needs to output the o, back up to center the cedilla, output the spacing cedilla mark (0xcb in StandardEncoding), and fix up the spacing for the next character.
I haven't decided whether GnomeText is going to do a canonical decomposition, try to combine as much as possible, or just pass through what the application provides. In either case, the IETF adage, "be conservative in what you generate, liberal in what you accept" would seem to apply. If it's not too much trouble, accept accented characters in both canonical and uncanonical format.
The Microsoft Uniscribe system uses an interesting approach for handling diacritical marks and other combining characters: the font interface converts strings of characters to strings of glyphs, and the returned glyphs are allowed to have X and Y offsets, for positioning relative to the base character. This might be better than dynamically allocating glyphs, but would require a fairly significant change to the apis.