[Paste] Paste's HTTP server thread pool (reliability)

Top Page
Author: Ian Bicking
Date:  
To: Python Paste Project
Subject: [Paste] Paste's HTTP server thread pool (reliability)
Hi quiet Paste list.

If you are anything like me, you hate it when your HTTP server freezes
up because all the threads are wedged for some reason. If you aren't
using pooling I guess this won't happen, but instead you'll eventually
have tons of wedged threads sitting around and that's not great either.

Anyway, in the trunk I made some additions to the thread pool and a
small app (egg:Paste#watch_threads) that lets you monitor the pool
through the web and even kill threads. (How reliable the thread killing
is, I'm not sure -- it worked for a couple cases I tried, like reading
past the end of a socket and an infinite loop in Python.)

Of course it is flawed, since if your thread pool is exhausted you can't
access the app. Plus, while seeing wedged threads is nice for
debugging, it's not really something best managed manually.

So I'm thinking about how the thread pool could be improved. Here's my
idea; I'm interested in opinions:

When a request comes in and there are no free threads to handle it, a
new thread should be created up to max_threads (configurable). Maybe
the thread should only live for one request, or maybe it should be added
to the pool and the pool periodically reduced in size if possible.

When a request comes in and there are already a maximum number of
threads created, the thread most likely to be wedged (the one that's
been working the longest) should be killed and another one added. If
none of the threads has been working very long (wedged_thread_threshold)
then we assume we just have a lot of requests coming in, and we simply
queue the request. That means if like 10 threads all get wedged at
once, and another request comes in, it could end up queued until yet
another request comes in. And then that other request will kill a
thread, the old request gets off the queue, and the new request is back
on the queue. I'm not sure how to deal with that problem, except maybe
to try to empty the queue with multiple kills once a wedged situation is
detected.

Maybe we should add an API to the request environment to tell the server
that a long-running request is expected. This way a conscientious
programmer could still do long-running requests without being afraid of
being killed, but you have to express real intention to do so.

We can check if threads that we killed are actually dead (they'll still
be listed in threading._active). If we see an excess of these we can
kill the whole process (assuming that a supervisor process is going to
restart the server). Configurable, zombie_thread_threshold or
something, obviously not on by default.

Anyway, any thoughts anyone has would be appreciated. Clearly I'll have
to write up a document explaining all this, as it's going to be too long
to go in a docstring.

I guess I'll also have to clean up some of the lingering issues in
Paste's HTTP server too (I think just the wsgi.input blocking problem
and the limited request methods), as once I start relying on this stuff
it'll be harder to move to another server. So I can no longer vacillate
  on what server people should use -- ours!

-- 
Ian Bicking | ianb@??? | http://blog.ianbicking.org

_______________________________________________
Paste-users mailing list
Paste-users@???
http://webwareforpython.org/cgi-bin/mailman/listinfo/paste-users