Raph Levien <raph@acm.org>
20 Sep 1999
This document is a retrospective look at the 0.1.7 release of Gzilla, a web browser project I started, and which is now handed over to a group led by Christopher Reid Palmer. I learned a lot from working on Gzilla, and had a lot of fun in the process. I'm hoping that this document will help the new team get a grip on the code, and possibly offer some general insights.
I started Gzilla some time in early 1997. The first release (0.01) was May 15, 1997, and contained about 4500 lines of code. This version had very basic HTML and GIF capabilities, no caching, and a crude UI. But I could and did use it to surf the web!
I continued the development of Gzilla quite actively until the end of 1997, at which time Netscape announced that they would be releasing Mozilla as an open source project. This was a stunning enough announcement that I felt there was no longer a pressing need for a free web browser like Gzilla. Over a year and a half later, Mozilla is still not quite at beta, and, though it looks like a very nice browser, it's big, complex, and slow enough that I now feel that there is a good ecological niche for a small, simple, and fast browser such as Gzilla.
By the 0.1.7 release, Gzilla had grown up quite a bit. Image support included both GIF and JPEG, with hooks for PNG support. There was an excellent RAM caching subsystem. HTML rendering was still fairly crude, but there was good support for forms. By this time, the Gzw (Gzilla Widget) framework was in place, providing a much more powerful infrastructure for future development such as tables. The UI had gotten quite a bit nicer, with conveniences such as bookmarks. A few other people and I were using it for the bulk of our web browsing.
For the year of 1998 and the first half of 1999, there was virtually no development on the browser. I tinkered a bit with the DNS module and folded in a few patches, but the core didn't really move. My interests had been captured by other projects, including working on the Gnome core.
On July 18, 1999, Christopher Reid Palmer wrote me inquiring whether I might be willing to hand over development to him. I was eager to do this, as I really didn't have the time to put into it, but still believed that the project had potential. Since then, I've been more than usually overwhelmed, so I haven't had as much chance to participate as I would have liked, but have made myself known on the mailing list and answered a few questions. I'm hoping that this document will fill in a much more complete picture of the history of Gzilla, the details of the 0.1.7 release that I handed over, and my suggestions for future development.
The architecture
As mentioned above, the main design goals were that Gzilla be small, simple, and fast. I had a few other goals. For one, I wanted Gzilla to be really good at incremental rendering. When bytes came in from the network, I wanted them to be displayed on the screen as soon as possible afterward. In addition, I wanted the UI to be responsive at all times (compared with Netscape's tendency to lock for brief periods of time when, say, reloading a very large document from cache). To this day, I feel that all these goals are valid. Gzilla's adeptness at incremental rendering contributes a lot, I think, to its perception as being really fast.
I also had a chip on my shoulder about abstraction. I had been browsing throught the Mnemonic design documents, and was very much put off by the abstract, head-in-the-clouds nature of the design. Thus, Gzilla was perhaps an overreaction in the direction of concreteness. I wanted every function to do something real, like speaking HTTP protocol, decoding image data, displaying HTML text, etc. I wanted to avoid code that had "middle management" functions like object request brokering. I believe that this decision contributed to the simplicity of Gzilla, which helped a lot. Neither I nor the Mnemonic people knew what we were doing when we started our respective projects, but if you're ignorant, you have a lot better chances with a simple project than with a complex one. These days, I've mellowed a bit, and in fact spend a good deal of my time on such "middle management" projects. But I still feel that the experience of Gzilla helped me refine my craft, to at least learn what was possible without all the layers of abstraction.
Finally, I was struck by the power and beauty of the new Gtk+ widget toolkit. I saw Gtk+'s input mechanism as a wonderful way to handle asynchronous network connections, and also saw its widget system as a powerful base for document display. Thus, in the first releases of Gzilla, the document structure (page with embedded form widgets and images) was reflected directly in a Gtk+ containment hierarchy. I later discovered that the Gtk+ widget system had numerous technical shortcomings as the core structure for a document, so I developed Gzw. Nonetheless, this plan made for good modularity in the design from the early stages, and helped me prototype the early versions quickly.
From the first days, I wanted to have a clean separation between the network interface and the renderers. Thus, I developed the ByteSink object to function as a sort of "corpus callosum" between the two subsystems. Later, I realized that I was putting too much weight on a single interface, having it responsible for both asynchronous transfer of bytes and a variety of UI tasks. So this was an important lesson: when measuring complexity of interfaces, look at the complexity of each individual interface, rather than the total number of interfaces. If I were redoing this design from scratch, I'd have one interface for just transferring bytes, and other interfaces for the other UI functions. Nonetheless, the ByteSink interface did meet its goal of providing a clean separation between the network and render functions.
One of the major design decisions in Gzilla was the idea that the HTML parser would not actually store a parse tree anywhere. Instead, it incrementally parsed the HTML document, directly mutating a widget to display page contents. The parser's most important internal state was a stack of attributes. For example, when the byte sequence "<b>" came in from the network, the parser would push the stack and set the "bold" attribute on the top-of-stack. Subsequent text would come in and get sent to the page widget with the "bold" attribute set. When the corresponding "</b>" end-tag came in, it would pop the stack, so that following text is rendered without the bold attribute.
Looking back at this decision, it's hard to say whether it was the right one. Going directly from HTML tags to the page widget eliminated another layer. In addition, working with trees incrementally can be tricky - the simplest implementation of trees is not incremental at all. On the flip side, this design gives up some flexibility. For example, it's hard to imagine how DOM access to the document contents (using JavaScript, for example) would be done with the current Gzilla architecture. Also, while it's a lot harder to quantify, this approach added more constraints to the design. In particular, it's necessary to build all of the paragraph layout logic into the page widget; it's not really practical to put that logic in the parser, because that would frustrate incremental rendering. Once the parser hands the attributed text to the page widget, it's out of the parser's hands.
So on balance, I'd say that the tree-less approach did simplify the overall design of Gzilla, but I'm not sure that's the way I would do it now. One thing, though, that speaks well of the decision I made is that I'm now working on Gdome, a DOM implementation for handling XML trees, with support for event listeners for incremental changes to the tree, and I'm finding it a lot harder than I thought.
There are a number of other architectural issues, but I'll discuss those in the individual sections below.
Networking
I'm very proud of the networking code. It's reasonably simple, seems to work well in practice, and I feel that the caching in particular is especially fine. For example, if a URL is in the process of being loaded, and another "get" of the same URL is initiated, the cache will feed the second requestor the bytes retrieved so far, then route further incoming bytes from the network to both requestors. This is much slicker than the way Netscape 4.0, for example, would handle this situation.
The main problems in the networking area are simple lack of features. For example, virtually none of the HTTP/1.1 extensions are supported, of which KeepAlive is especially important for performance. Also, the network module didn't have any support for the POST method. Nonetheless, I feel that the network code is a good foundation.
DNS
The DNS handler is still a little sticky. The issue here is that Unix has no portable way to do asynchronous dns resolves - the most portable function is gethostbyname(), and it generally blocks. Most Unices have threadsafe variants, but there is no real portable standard.
Consequently, Gzilla took a very portable approach, which is to fork several processes on startup that are dedicated to DNS lookups. During browser operation, the main process communicates with these child processes with (nonblocking) pipes.
This actually works pretty well, but there are some unpleasantries. First, Gtk+ has a fundamental problem with forking children - it tends to leave lots of file descriptors open (including the X connection). This bungs things up, thus Gzilla has some ugly and heavy-handed code to close most all open file descriptors. Second, although it would be nice to fork new DNS processes dynamically, doing so would eat virtual memory fairly rapidly, because the child process would contain all the state of the parent.
The version in 0.1.7 has some bugs. I posted an updated version (g_dns) to the Gzilla mailing list recently which fixes these bugs. In particular, g_dns should really do the Right Thing when there are lots of requests queued up. A special feature of g_dns is that it falls back to blocking lookups when G_DNS_NUM_INIT is 0, which is very helpful for debugging (gdb somehow has trouble with the child processes).
A variant of gzilla's DNS code exists in Gnome under gnome-dns, and also has some bugs, most notably violent death when the queue of outstanding DNS requests becomes too large.
I think over the long term, the best solution to DNS is to move the child processes into a separate helper executable. This would foster the efficient dynamic spawning of new servers, because the exec() call blows away the virtual memory consumed by cloning the parent's state. In addition, moving Gtk+ to generally set the FD_CLOEXEC flag on open file descriptors would get rid of the ugly code to manually close these. The Gtk+ people agree with this change (forking in Gtk+ has been identified as a general problem), and we are likely to see some changes in future versions.
One shortcoming of the Gzilla DNS subsystem is that the DNS cache never expires. This derives from the fact that gethostbyname() drops cache timing info on the floor. It's unlikely to be a very large problem in practice, though.
Gzw
Early experience with using Gtk+ widgets in a containment hierarchy to reflect the document structure pointed up a number of shortcomings in the Gtk+ widget system:
- Scrolling using GtkScrolledWindow was limited to 32kpixels.
- The size negotiation mechanisms in Gtk+ were inadequate for wrapped text.
- Gtk+ forces the creation of an X window for each child widget if that window is to receive mouse and keyboard events. This was a major performance problem.
My original plan was to fix these problems within Gtk+. However, I soon gave up on this, as the needed changes simply ran too deep. Instead, I came up with my own widget hierarchy, Gzw (Gzilla widgets).
The Gzw widget system resembles Gtk+ fairly strongly. The main differences reflect the shortcomings listed above: Gzw has a large-window scrolling capability, a three-phase size negotiation system optimized for wrapped text (including tables), and an event propagation system that propagates mouse and keyboard events to child widgets. In addition, Gzw was implemented using its own simple objects-in-C mechansim, rather than Gtk+ objects (I was concerned that the latter might be too heavyweight, considering that there might be lots in a typical document; looking back, I think I took this concern too seriously).
All three of these main changes represent significant technical advantages over plain Gtk+, and have relevance for quite a few applications other than web browsers. It is my hope that Gtk+ 2.0 can incorporate them directly into the core.
Scrolling
The original Gzilla simply used the GtkScrolledWindow widget for scrolling. At first, I was very happy with this, as it seemed like a perfect example of leveraging the existing work that had been put into Gtk+ to avoid having to do the scrolling myself. However, I soon ran into the 32kpixel scrolling limit of GtkScrolledWindow, which in turn derives from the fact that it's implemented directly using X scrolling, and X windows are limited to 32kpixels in size.
Thus, when it came time to design Gzw, getting scrolling right was a major consideration. I considered a number of alternatives, and settled on what's in Gzilla today.
Basically, the current Gzw scrolling code uses an X window for scrolling, just as does GtkScrolledWindow. However, it also keeps track of whenever the scrolling coordinates exceed 32k, and at that time "jumps" the window, redrawing all the contents. Thus, the X window coordinate always stays within the 32k bounds. It's still not a perfect solution, as the jumping creates a just-noticeable flash occasionally when scrolling large documents. In practice, I found this to be quite acceptable. I usually didn't see the flash unless I was really watching for it.
A second potential criticism of the Gzw scrolling mechanism is that it waits for the X server to generate an expose event on the part that was uncovered by the scroll. Thus, it adds an additional server roundtrip latency between the scroll and the fill-in of the uncovered area (compared with XCopyArea-based scrolling, which is for example what's used in the Gimp). I worried a lot about this, but in practice I'm not sure it's that important. On the P100 I used for virtually all Gzilla development, server roundtrip was on the order of a millisecond. On my current development machine (a 400 MHz dual Celeron), it's more like 125 microseconds. That absolutely should not make a visible difference.
However, the scrolling mechanism had a bad interaction with expose event compression (see below). Thus, many users noticed that you could scroll quite a bit, and then visibly watch the fill-in. This is quite broken, in my opinion, and needs fixing.
GzwImage
GzwImage was basically an adaptation of the GtkPreview widget to the Gzw framework. The original Gzilla simply embedded a GtkPreview, which was nice and simple, but as mentioned above there were serious performance problems.
The function of GzwImage was pretty simple - it stored an RGB image, and displayed it as a Gzw widget. This is the mechanism Gzilla used to display <img> tags.
GzwImage was one of the major inspirations for my later GdkRgb work. When I was working on GdkRgb, I found a major performance flaw in the older code, affecting GzwImage, as well as GtkPreview and the Gimp image display code. Essentially, the old code performed an XSync() call at the end of each image write, to make sure that there were no race conditions with the use of the shared memory buffer. For large images, this is no big deal, but for lots of small images (not at all atypical in a web page), it was a major source of slowdown.