I’d started to play with multiprocessing module that came up with python 2.6. Multiprocessing module is very similar to threading (it has almost the same functions/classes that threading).
Here are by my opinion three advantages over threading.
- Multiprocessing runs on processes not threads.
- Overcomes the GIL (global interpreter lock) that threading is using by using sub processes.
- Processes can be synchronized even remotely so we could write a concurrent calculations over the network
I made a simple example class that uses multiprocessing module to scan ports, just for testing i made the same thing that Lukasz made using threading. This example however is not more efficient than the one presented in http://www.python-blog.com/2009/07/01/python-threaded/ but when we could replace the function check_port with more CPU consuming function we could end up with performance grater by the number of cpus/cores we have. For example in one of my projects recently i made a calculations for popular gambling game in Poland, MultiMulti. I’d made up a calculations of most repeating number in last 50 games. To calculate combination of 9’s over 20’s 50 times with threading i’d got around 1200s with multiprocessing i was able to calculate it in around 700 s with my Core2Duo CPU. I wish i had core2 quad to check the performance :) So i’f you need to make some heavy calculations multiprocessing can give you that performance.
Here’s the code and you can download the port descriptions file port_list to match port with description.
from multiprocessing import Process, Queue, cpu_count, Lock
import socket, sys
class PortScanner(object):
''' multiprocessing port scanner with port description'''
def __init__(self, host = '' , port_range = (1, 100), nr_processes = cpu_count(), port_list_file = ''):
'''
port_range=(start,stop) default 1,100
nr_processes = int default cpu_count() '''
q = Queue()
l = Lock()
port_list = []
try:
for i in open(port_list_file).readlines():
port_list.append([x.strip() for x in i.split('\t')])
except IOError:
print 'no port list file specified'
pass
for _ in xrange(port_range[0], port_range[1]):
q.put((host, _))
#to stop all processes we have to put STOP to queue and break the loop for each process
for _ in xrange(nr_processes):
q.put('STOP')
for _ in xrange(nr_processes):
p = Process(target = self.check_port, args = (q, l, port_list))
p.start()
def check_port(self, q, l, port_list):
''' worker class invoked by process '''
while True:
queue_ret = q.get()
if queue_ret == 'STOP':
break
s = socket.socket()
try:
s.connect((queue_ret))
#lock for uncorrupted printing to console
l.acquire()
print "[INFO] %s on port %s is open" % (queue_ret)
if len(port_list) > 1:
for i in port_list:
if int(i[0]) == int(queue_ret[1]):
for _ in i:sys.stdout.write(_ + " ")
print "\n\n"
l.release()
except socket.error:
#print "[WARNING] %s on port %s is closed" % (queue_ret)
pass
s.close()
if __name__ == "__main__":
PortScanner(host = 'example.com', port_range = (1, 60), nr_processes = 40, port_list_file = 'port_list.data')