What's a bytesink?

One of the main structuring mechanisms in gzilla is a bytesink. It
serves as the interface between the application which wants to display
Web data and the functions which parse the Web data and display it in
gtk+ widgets. Thus, modules implemented as bytesinks could be used in
many different applications without modification.

A bytesink is a gtk+ object that implements about a dozen methods.
Some of these methods (including write, close, reset, and
set_base_url) are implemented by the bytesink module and are invoked
by the application. The remaining methods (request_url, link, status,
redirect, title, and request_url_img) are not implemented by the
bytesink, and are typically used as gtk_signals that are connected by
the calling application, very analogously to the way the clicked
method is defined by gtk_button.

If an application wishes to display a Web page, it typically executes
the following sequence of steps:

1. Create a new bytesink using, say, gzilla_web_new (). This creates a
new gtk+ widget too, accessible as bytesink->widget.

2. Pack the widget into the display window.

3. Connect the appropriate signals.

4. Start writing data into the bytesink using gzilla_bytesink_write
().

5. When the end of the file is reached, call gzilla_bytesink_close ().

6. Keep the bytesink around so that the signals can be handled. If no
signals were connected (for example, in an embedded widget), the
bytesink can be immediately destroyed using gtk_object_destroy ().

7. The bytesink can always be destroyed after its widget is destroyed.

gzilla_web_new () takes a generic HTTP or MIME object as input, and
dispatches a new bytesink as soon as it parses the content type of the
input object. You can also call gzilla_gif_new () or gzilla_html_new
() on GIF files and HTML files, respectively. (more will be added
soon, especially jpeg and textplain)

Implementation of a bytesink is not especially tricky. GzillaByteSink
is a virtual class, implementing none of the methods. The actual
bytesinks inherit from GzillaByteSink and instantiate the methods. In
general, the implementation of the write method parses the input data,
processes it somehow, and feeds it to its gtk+ widget. Often, the
write method is implemented as a state machine because it's at the
application's mercy as to when it gets data and how much--it can't
just do a read call on its input.

Notes added 20 June 1997

The role of the bytesink abstraction has been expanded somewhat: it is
now the interface for the RAM cache, as well. Specifically, the cache
feeds data into a bytesink, and if it misses, it creates a new
bytesink and requests for a network connection to be established,
feeding the cache line.

Network                                               gtk_page
   |                                                     ^
   v                                                     |
                                                         |
gzilla_http
               1                  2               3
gzilla_file  ----> gzilla_cache ----> gzilla_web ---> gzilla_http
     .                            |
     .                            |
     .                         interface

Here's a brief description of how a Web page gets displayed. The
user's request for a URL eventually gets to a open_doc_url in
interface.c. Bytesink #2 is already in existence from the creation of
the browser window (the gzilla_web_new () call). open_doc_url then
calls open_url with bytesink #2 as an argument.

open_doc calls gzilla_cache_get_url, requesting that the Web page for
the given URL be fed into bytesink #2. If it hits in the cache, then
the cache feeds bytesink #2 directly, and bytesink #1 never even gets
created.

If on the other hand, it misses in the cache, then gzilla_cache_miss
calls gzilla_get_url to set up bytesink #1. gzilla_get_url simply
looks at whether it's an http:, file:, or other URL and calls the
appropriate get routine (i.e. gzila_file_get, gzilla_html_get).

When bytes start flowing through bytesink #2, gzilla_web starts
examining the header for Content-Type information. As soon as the
header is complete, it uses the content type information to create a
new bytesink (#3) specific to that content type. For example, if the
content type is text/html, then gzilla_web creates a new gzilla_html
bytesink (see gzilla_web_dispatch).

Bytes then flow from the network, through the sequence of three
bytesinks, and finally to the gtk_page widget, displaying the Web page
on the screen.

Shutdown and aborting demand a little special care. Normal shutdown
begins in the network, and results in a close call on bytesink #1.
This gets propagated down the chain to bytesinks #2 and #3. The
interface hooks the close call at bytesink #2 for two reasons: first,
to keep track of the number of active bytesinks (defined as connected
to the cache or network but not yet closed), and so that it can delete
all the bytesinks other than the main document as soon as they're
closed. The interface keeps the main document bytesink around so that
it can generate link, status, request_url_img signals, etc.

Aborting is initiated in the interface (e.g. by pressing the stop
button). It causes an abort signal to be generated on bytesink #2.
This propagates backwards through gzilla_cache to bytesink #1, and
again to the network. Actually, pressing the stop button causes all
active bytesinks in the window to be aborted, another reason for the
interface to keep track of the active bytesinks.

The cache is smart about multiple requests for the same URL. For
example, if a URL is requested in two windows and one is aborted, it
will keep the network connection alive and continue to feed the other
one. Similarly, if another window requests the same URL, the cache
will feed it the data it's already received from the network rather
than starting up a new connection.

I should point out that images are handled a little differently -
there is an imgsink that is somewhat analogous to the bytesink, and a
cache for decompressed images that is somewhat analogous to the
bytesink cache described above. However, the interface is not quite
finished (aborting on multiple windows doesn't work).