What's a bytesink? One of the main structuring mechanisms in gzilla is a bytesink. It serves as the interface between the application which wants to display Web data and the functions which parse the Web data and display it in gtk+ widgets. Thus, modules implemented as bytesinks could be used in many different applications without modification. A bytesink is a gtk+ object that implements about a dozen methods. Some of these methods (including write, close, reset, and set_base_url) are implemented by the bytesink module and are invoked by the application. The remaining methods (request_url, link, status, redirect, title, and request_url_img) are not implemented by the bytesink, and are typically used as gtk_signals that are connected by the calling application, very analogously to the way the clicked method is defined by gtk_button. If an application wishes to display a Web page, it typically executes the following sequence of steps: 1. Create a new bytesink using, say, gzilla_web_new (). This creates a new gtk+ widget too, accessible as bytesink->widget. 2. Pack the widget into the display window. 3. Connect the appropriate signals. 4. Start writing data into the bytesink using gzilla_bytesink_write (). 5. When the end of the file is reached, call gzilla_bytesink_close (). 6. Keep the bytesink around so that the signals can be handled. If no signals were connected (for example, in an embedded widget), the bytesink can be immediately destroyed using gtk_object_destroy (). 7. The bytesink can always be destroyed after its widget is destroyed. gzilla_web_new () takes a generic HTTP or MIME object as input, and dispatches a new bytesink as soon as it parses the content type of the input object. You can also call gzilla_gif_new () or gzilla_html_new () on GIF files and HTML files, respectively. (more will be added soon, especially jpeg and textplain) Implementation of a bytesink is not especially tricky. GzillaByteSink is a virtual class, implementing none of the methods. The actual bytesinks inherit from GzillaByteSink and instantiate the methods. In general, the implementation of the write method parses the input data, processes it somehow, and feeds it to its gtk+ widget. Often, the write method is implemented as a state machine because it's at the application's mercy as to when it gets data and how much--it can't just do a read call on its input. Notes added 20 June 1997 The role of the bytesink abstraction has been expanded somewhat: it is now the interface for the RAM cache, as well. Specifically, the cache feeds data into a bytesink, and if it misses, it creates a new bytesink and requests for a network connection to be established, feeding the cache line. Network gtk_page | ^ v | | gzilla_http 1 2 3 gzilla_file ----> gzilla_cache ----> gzilla_web ---> gzilla_http . | . | . interface Here's a brief description of how a Web page gets displayed. The user's request for a URL eventually gets to a open_doc_url in interface.c. Bytesink #2 is already in existence from the creation of the browser window (the gzilla_web_new () call). open_doc_url then calls open_url with bytesink #2 as an argument. open_doc calls gzilla_cache_get_url, requesting that the Web page for the given URL be fed into bytesink #2. If it hits in the cache, then the cache feeds bytesink #2 directly, and bytesink #1 never even gets created. If on the other hand, it misses in the cache, then gzilla_cache_miss calls gzilla_get_url to set up bytesink #1. gzilla_get_url simply looks at whether it's an http:, file:, or other URL and calls the appropriate get routine (i.e. gzila_file_get, gzilla_html_get). When bytes start flowing through bytesink #2, gzilla_web starts examining the header for Content-Type information. As soon as the header is complete, it uses the content type information to create a new bytesink (#3) specific to that content type. For example, if the content type is text/html, then gzilla_web creates a new gzilla_html bytesink (see gzilla_web_dispatch). Bytes then flow from the network, through the sequence of three bytesinks, and finally to the gtk_page widget, displaying the Web page on the screen. Shutdown and aborting demand a little special care. Normal shutdown begins in the network, and results in a close call on bytesink #1. This gets propagated down the chain to bytesinks #2 and #3. The interface hooks the close call at bytesink #2 for two reasons: first, to keep track of the number of active bytesinks (defined as connected to the cache or network but not yet closed), and so that it can delete all the bytesinks other than the main document as soon as they're closed. The interface keeps the main document bytesink around so that it can generate link, status, request_url_img signals, etc. Aborting is initiated in the interface (e.g. by pressing the stop button). It causes an abort signal to be generated on bytesink #2. This propagates backwards through gzilla_cache to bytesink #1, and again to the network. Actually, pressing the stop button causes all active bytesinks in the window to be aborted, another reason for the interface to keep track of the active bytesinks. The cache is smart about multiple requests for the same URL. For example, if a URL is requested in two windows and one is aborted, it will keep the network connection alive and continue to feed the other one. Similarly, if another window requests the same URL, the cache will feed it the data it's already received from the network rather than starting up a new connection. I should point out that images are handled a little differently - there is an imgsink that is somewhat analogous to the bytesink, and a cache for decompressed images that is somewhat analogous to the bytesink cache described above. However, the interface is not quite finished (aborting on multiple windows doesn't work).