Python Paste



WSGIFilter is an extraction of a WSGI pattern that I've implemented in several other projects, though always with small differences and lacking features in some contexts. This is an attempt to get it Just Right.

See to do to see some of what needs to be done. Discussion and feedback can take place on the Paste mailing list.


So what is WSGIFilter?

Implementing output filtering in WSGI is a bit tricky. Output can come through the app_iter, or the start_response writer, and sometimes can be out of order. Typically only some content is intended to be filtered (often text/html). Lastly, filtering shares a lot of needs that HTTPEncode also handles through its format system -- allowing you to work on higher-level objects like parsed XML. Using the format system in concert with a stack of similar WSGIFilter filters or HTTPEncode this can be used to avoid unnecessary encoding and decoding, by leaving the content as native Python objects.

An example of an application that I've written that could have used WSGIFilter (if it had existed) is Commentary. Another is Deliverance. Some more modest examples that could use WSGIFilter are paste.debug.profile and paste.debug.prints. So if you are thinking "is WSGIFilter for me?" you might want to think about the similarity of what you are doing to some of these styles of work.

The specific thing that got me thinking about WSGIFilter was the use of server-side processing of microformats, potentially stacking up multiple transforms without introducing too much overhead (either code or performance).

Using It

You will subclass from wsgifilter.Filter. For example:

class UpperFilter(Filter):

    def filter(self, environ, headers, data):
        return data.upper()

This upper-cases all the content going through the filter. You use it like:

from therestofmyapp import MyApp
# MyApp is a WSGI app factory
app = UpperFilter(MyApp(...))
# now app is a WSGI app

If you want to use it with Paste Deploy, you should put something like this in your

from setuptools import setup
    name="MyPackage", ...
    myfilter = mypackage.myfilter:MyFilter.paste_deploy_middleware

Now you can use it as egg:MyPackage#myfilter

The filter method

The key method is the filter method. It gets the environment of the request, a list of headers, and some data. The environment just the WSGI environment.

The headers can be modified in place -- they won't be sent until you return from the function.

The data will be some... data. There are three basic options:

  1. It's a plain string (str). You return the same.

  2. You want unicode, and set decode_unicode = True in your class. You will get a unicode string and should return the same.

  3. You want something else, like parsed XML. You should either set format = format_object or format_output = 'object_type'. For instance, format_object = 'lxml.etree' will try to parse whatever we get with lxml. If you give just format_output then the filter will try to find the format that gives you that output given the mimetype we've received. If you give a format it'll use that exact format.

    (To give an idea of how this differs, there's actually two formats that produce lxml.etree -- one is the XML parser that accepts application/xml and one is an HTML parser that accepts text/html)

    As in all the other cases, you return what you get; the format will handle serialization for you.

Conditional filtering

If you are filtering HTML (which is the default), you probably don't want to look at Javascript or CSS. You can select what content types you want with filter_content_types = (list of types). The list is ('html/html', ) by default.

By default only 200 OK responses are filtered; error responses are not. If you want to filter everything use filter_all_status = True.