Archive for the ‘Multiprocessing’ Category

Filed Under (Multiprocessing, Python) by Marcin Kuźmiński on July-17-2009

I’d started to play with multiprocessing module that came up with python 2.6. Multiprocessing module is very similar to threading (it has almost the same functions/classes that threading).

Here are by my opinion three advantages over threading.

  • Multiprocessing runs on processes not threads.
  • Overcomes the GIL (global interpreter lock) that threading is using by using sub processes.
  • Processes can be synchronized even remotely so we could write a concurrent calculations over the network

I made a simple example class that uses multiprocessing module to scan ports, just for testing i made the same thing that Lukasz made using threading. This example however is not more efficient than the one presented in http://www.python-blog.com/2009/07/01/python-threaded/ but when we could replace the function check_port with more CPU consuming function we could end up with performance grater by the number of cpus/cores we have. For example in one of my projects recently i made a calculations for popular gambling game in Poland, MultiMulti. I’d made up a calculations of most repeating number in last 50 games. To calculate combination of 9’s over 20’s 50 times with threading i’d got around 1200s with multiprocessing i was able to calculate it in around 700 s with my Core2Duo CPU. I wish i had core2 quad to check the performance :) So i’f you need to make some heavy calculations multiprocessing can give you that performance.

Here’s the code and you can download the port descriptions file port_list to match port with description.

from multiprocessing import Process, Queue, cpu_count, Lock
import socket, sys

class PortScanner(object):
    ''' multiprocessing port scanner with port description'''

    def __init__(self, host = '' , port_range = (1, 100), nr_processes = cpu_count(), port_list_file = ''):
        '''
        port_range=(start,stop) default 1,100
        nr_processes = int default cpu_count() '''
        q = Queue()
        l = Lock()
        port_list = []

        try:
            for i in open(port_list_file).readlines():
                port_list.append([x.strip() for x in i.split('\t')])

        except IOError:
            print 'no port list file specified'
            pass

        for _ in xrange(port_range[0], port_range[1]):
            q.put((host, _))

        #to stop all processes we have to put STOP to queue and break the loop for each process
        for _ in xrange(nr_processes):
            q.put('STOP')

        for _ in xrange(nr_processes):
            p = Process(target = self.check_port, args = (q, l, port_list))
            p.start()

    def check_port(self, q, l, port_list):
        ''' worker class invoked by process '''
        while True:
            queue_ret = q.get()

            if queue_ret == 'STOP':
                break

            s = socket.socket()

            try:

                s.connect((queue_ret))

                #lock for uncorrupted printing to console
                l.acquire()
                print "[INFO] %s on port %s is open" % (queue_ret)

                if len(port_list) > 1:
                    for i in port_list:
                        if int(i[0]) == int(queue_ret[1]):
                            for _ in i:sys.stdout.write(_ + " ")
                            print "\n\n"

                l.release()
            except socket.error:
                #print "[WARNING] %s on port %s is closed" % (queue_ret)
                pass
            s.close()

if __name__ == "__main__":

    PortScanner(host = 'example.com', port_range = (1, 60), nr_processes = 40, port_list_file = 'port_list.data')