Thursday, 31 March 2011

Pipelining in Redis via Python


Redis is an awesome key-value storage unit having client support for multiple languages,but being a python fan I will be discussing it's python client ,redis-py.A great introductory article on redis and redis-py has been provided by Adam at PlayNice.ly.The idea is simple redis allows you to store a key (which can be string,list,has,set or sorted set) and it's value (which again can be a list,string,set or a sorted set).Redis operates as a client-server model.It provides a basic TCP server.The requests for key-value pairs to be stored are forwarded to the server via client(in my case redis-py).So,each request made to the server is associated with a round trip time(RTT).RTT basically involves time taken for server to receive the command from the client and execute it.Hence,a smaller RTT can be a huge performance improvement.
So instead of executing commands one by one,if a number of commands were executed in batch RTT could be significantly reduced.This is were redis pipeline comes into play.The pipeline is based on the concept of queue(FIFO).The set of commands to be executed are all queued up in pipe but, executed later together.
In python we make an instance of Redis class available in redis module which represents the conncetion to the redis server.The pipeline object is obtained by calling the pipeline method on the Redis instance.This pipeline class inherits Redis.Hence the pipeline object provides all the commands for inserting key-value pairs along with the additional facility of pipelining these commands to be executed later by calling the execute method on the pipeline object.The code for comparing the performance between with and without pipeline is provided on redis website here, but it is in Ruby.So i decided to implement it in python to understand it better, as I am a strong believer of learning by doing.Now time for some code:

Importing the relevant modules

1: import redis
2: import time

Now defining the function without_pipeline():

1:  def without_pipeline():  
2:       r=redis.Redis()  
4:       for i in range(10000):  
5:            r.ping()  
6:       return  

ping() is a simple method used to check if the server is running or not.If the server is running it returns "PONG" in response.The command ping can also be tested from redis command line(redis-cli).In above code we sequentially ping the server 10,000 times.

Now with_pipeline():

1:  def with_pipeline():  
2:       r=redis.Redis()  
3:       pipeline=r.pipeline()  
4:       for i in range(10000):  
5:            pipeline.ping()  
6:       pipeline.execute()  
7:       return   

The commands in the pipeline are executed simultaneously by pipeline's execute() method.

The bench function below is used for benchmarking or estimating the rtt in both the cases

1:  def bench(desc):  
2:       start=time.clock()   
3:       desc()  
4:       stop=time.clock()  
5:       diff=stop-start  
6:       print "%s has taken %s"%(desc.func_name,str(diff))  

The bench function takes a function as an argument, as functions in python are callable objects(giving testament to python's awesomeness).A simple timer is used for determining rtt.So final code is:

1:  import redis  
2:  import time  
3:  def bench(desc):  
4:       start=time.clock()   
5:       desc()  
6:       stop=time.clock()  
7:       diff=stop-start  
8:       print "%s has taken %s"%(desc.func_name,str(diff))  
9:  def with_pipeline():  
10:       r=redis.Redis()  
11:       pipeline=r.pipeline()  
12:       for i in range(10000):  
13:            pipeline.ping()  
14:       pipeline.execute()  
15:       return   
16:  def without_pipeline():  
17:       r=redis.Redis()  
18:       pipeline=r.pipeline()  
19:       for i in range(10000):  
20:            r.ping()  
21:       return  
22:  if __name__=="__main__":  
23:       bench(without_pipeline)  
24:       bench(with_pipeline)       

Output obtained:

without_pipeline has taken:  0.39
with_pipeline has taken : 0.19

The results speak for themselves.Although it is not very accurate but gives an idea of pipeline's advantage. Feel free to leave your comments below or suggest any other methods for trying this out.

Wednesday, 30 March 2011

Performance Analysis of Gevent,Eventlet and Node.js

       Meet my three new friends Eve(eventlet),his younger brother Geve(gevent) and Node(node.js).All three promise extreme scalibility over multiple web client requests.Eve was the first of it's kind, lightweight non-blocking i/o python network library.Eve was inspired by a beast name Twisted which also promised the non-blocking i/o but was heavy weight and difficult to tame.So,Eve paved the way for the next generation of lightweight,scalable networking libraries.Geve,although started out as younger and smaller version of Eve with a few major changes  has now grown into the most powerful python networking library ever.It can effortlessly handle multiple concurrent requests from web clients and is really easy to understand and implement.All python scripters out there who used to design web crawlers,web-bots or python/wsgi servers are now really excited to use these libraries to take their applications to next level.
        A python web application has two sides, server programming(done in python) and client programming(done using html,css and javascript).Node.js has bridged this gap between server and client programming efficiently by providing a javascript web development framework with built-in http server.This server is also  scalable,lightweight and provides asynchronous, non-blocking i/o.Node.js server is in fact  more better than it's python counterparts.Node.js allows server as well as client scripting using javascript.Now this came as a good news to many javascript programmers out there,who used to be a little frustrated as to not having control over server-side.The server provided by Node.js is powerful but the framework is very primitive and requires many components to be programmed from scratch.But,as Node.js is gaining popularity more and more people are pitching in to provide an efficient frameworks to Node.js.Express being one of them.At npm repository one can find really useful javascript modules developed by others facing the problem.All these module are available for free kinda like python cheese shop(PyPi).
        Anyway i thought of taking these three for a test drive using Apache Benchmark.The idea is to create a simple server using each one of my three friends and then throw multiple ,concurrent requests at them to see how they perform.For a request the server will be responding with the simple old Hello World string.Let us code:

 First Gevent:

 import gevent  
 from gevent import wsgi,pool  
 #the application to handle the response  
 def app(environ,start_response):  
      start_response("200 OK",[("Content-Type","text/plain")])  
      return "Hello World\n"  
 if __name__=="__main__":  
      print "The sweet thing is running on http://localhost:8912/"  
      pool=gevent.pool.Pool() #A pool of greenlets.Each greenlets runs the above defined function app for a client request  
      server=wsgi.WSGIServer(("localhost",8912),app,spawn=pool) #the server is created and runs multiple greenlets concurrently  
      server.serve_forever() #the server is made to run in loop   

Saving the above script as geve.py and executing it in terminal as "$ python geve.py" .This will fire up the gevent's server as loopback interface at port 8912.Now comes ab(Apache benchmark).In another terminal window I type this

ab -n 1000 -c 100 http://localhost:8912/ (and hit enter)

The above line runs ab with 1000 requests(-n) throwing 100 requests concurrently(-c) onto the server.You can vary the numbers depending upon your operating systems capability.Check the row saying "Time taken for tests: " mine says 0.201s.

Now Eventlet:

1: import eventlet
2: from eventlet import wsgi
3: def app(environ,start_response):
4: start_response("200 OK",[("Content-Type","text/plain")])
5: return "Hello World\n" 6: if __name__=="__main__":
7: wsgi.server(eventlet.listen(("localhost",6785)),app)

Saved as eve.py and then running with ab, result was 0.450s.(note the difference)

Last but not the least Node.js:

 var http=require("http");  
 http.createServer(function(req,res){  
      res.writeHead(200,{"Content-Type":"text/plain"});  
      res.end("Hello World\n");  
 }).listen(9124,"localhost");   
 console.log("The sweet thing is running on http://localhost:9124");  

The result was 0.172s (okay so this is the definition of awsome).

The analysis was carried out on Ubunut(10.10).Feel free to leave comments and share your experience.