gzip without gzip

I hit a few problems today when I was just trying to deflate this blog a little. Server bandwidth is a critical factor and 4 megabytes for every pageload is just too huge. The biggest part is tribute to many pretty pictures which I thought I had already downsized.

When it comes to php-based pages and text content though, gzip is number one on the list of promising candidates. Simple thing, install one of those generic gzip plugins from the plugin archive and you’re done – or so I thought. Turns out my server does have the zlib library, but it is not responding. After some experimenting with the plugin code, I realized that whichever way I tried to enable gzip buffered output would not work. Unfortunately, this seems to be quite a common problem.

The solution is found in the archives of php.net. In the community exchange for the ‘ob_gzhandler’-object, someone posted this smart workaround. If you break it down, it just buffers all output until the very end, then gets it, compresses it manually using zlib and returns it combined with the necessary markings. And even better – this one works even if zlib refuses to take on the more simple jobs!

Combined with a howto on wholepage filtering published on w-shadow.com, I ended up with a compact piece of code that can be inserted into an generic wordpress plugin:

function ws_set_up_buffer () {
    ob_start('ws_filter_page');
}
add_action('wp', 'ws_set_up_buffer', 10, 0);
 
function ws_filter_page ($html) {
    header("Content-Encoding: gzip");
    $gz_data = $html;
    $gz_size = strlen($html);
    return "x1fx8bx08x00x00x00x00x00".substr(gzcompress($html, 4), 0, -4).pack('V', crc32($html)).pack('V', $gz_size);
}

The first of the two functions is called before wp starts to gather the data to be displayed. All it does is redirect all output into a buffer with a special id, so we can identify it later on.

Next up is a callback function. It is executed after the blog framework has finished loading data and has flushed (emptied) the cache into the $html-variable. From there, all data is taken and shoved into the gzip compressor. The rest is just makeup so browsers will recognize the packet as genuine gzipped http.

What may be missing is a check, if the browser of the current visitor is capable of receiving gzip. This can be achieved using a conditional like this one in the callback function:

if(strstr($HTTP_SERVER_VARS['HTTP_ACCEPT_ENCODING'], 'gzip'))

The final code would look like this:

function ws_set_up_buffer () {
    ob_start('ws_filter_page');
}
add_action('wp', 'ws_set_up_buffer', 10, 0);
 
function ws_filter_page ($html) {
    if(strstr($HTTP_SERVER_VARS['HTTP_ACCEPT_ENCODING'], 'gzip')) {
        header("Content-Encoding: gzip");
        $gz_data = $html;
        $gz_size = strlen($html);
        return "x1fx8bx08x00x00x00x00x00".substr(gzcompress($gz_data, 7), 0, - 4).pack('V', crc32($gz_data)).pack('V', $gz_size);
    }
    else
    {
        return $html;
    }
}

I have just tested this implementation, works fine.

* Had problems with HTTP_ACCEPT_ENCODING not being properly readable, if encoding does not happen, just comment out the outer IF-construction!

A final word of warning: Things can become difficult if your wordpress installation includes other plugins that try to do wholepage filtering. See w-shadows blog post for details. You need to pay attention to the nesting of the plugins, as the gzip plugin must be the last one to tamper with the data. There will be no readable html left after it has finished.

Tagged as: ,