2009-12-19

Experimental HTTP server using Stackless Python

This blog post documents my experiment to write a non-blocking HTTP server based on coroutines (tasklets) of Stackless Python. My goal was to write a minimalistic web server server which can handle cuncurrent requests by using non-blocking system calls, multiplexing with select(2) or epoll(2), returning a simple Hello, World page for each request, using the coroutines of Stackless Python. I've done this, and measured its speed using ApacheBench, and compared it to the Hello, World server of Node.js.

The code is here: http://code.google.com/p/pts-mini-gpl/source/browse/#svn/trunk/pts-stackless-httpd http://syncless.googlecode.com/svn/trunk/benchmark.old/

Relevant ApacheBench spee results (for ab -n 100000 -c 50 http://127.0.0.1:.../):

Notes about the speed measurements:
  • I was using a recently compiled Stackless Python 2.6 and a recently compiled psyco for JITting.
  • I was surpriesed that my experimental code using select(2) and Stackless Python is faster than Node.js (by a factor of 1.925 on average, and the worst-case times are faster as well).
  • The speed comparison is not fair since Node.js has a real HTTP server protocol implementation, with its overhead, and my code just skips the HTTP header without parsing it.
  • Setting the TCP socket listen queue size to 100 (using listen(2)) made a huge difference on the worst case connection time. Compared to the setting of 5, it reduced the worst-case connection time from 9200 ms to 23 ms (!) in the measurement.
  • The source code of both servers can be found in the repository above.
  • My conclusion about the speed measurements is that a HTTP server based on Stackless Python and epoll(2) can be a viable alternative of Node.js. It would be worthwhile implementing one, and then doing proper benchmarks.

The advantage of using Stackless Python over callback-based solutions (such as Node.js in JavaScript, Twisted and Tornado) is that one can implement a non-blocking TCP server without being forced to use callbacks.

The unique advantage of Node.js over other solutions is that in Node.js not only socket communication is non-blocking, but DNS lookups, local filesystem access and other system calls as well – Node.js is non-blocking by design, but with other frameworks the programmer has to be careful not to accidentally call a blocking function. Avoiding a blocking function is especially cumbersome if a library used only provides a blocking interface.

Update: I've created a HTTP server capable of running WSGI applications. I've also integrated dnspython as an asynchronous DNS resolver. See it as project Syncless.

Update: Added (web.py) and CherryPy support.

Update: I've realized that the functionality of Syncless has already been implemented many times in Python. Examples: Concurrence, eventlet, gevent. See the comparison.

1 comment:

Richard said...

This is the kind of thing I think should be no work for users of Stackless.

Ideally, there would be a support library that monkey-patches all the existing IO with versions that are Stackless-compatible. Like the stacklesssocket module does, but also covering file IO, DNS, etc.