Recently I’ve run into some optimizations problem which I once hoped I wouldn’t had to face. After some research on the subject I decided to split some process using threads. I hadn’t have much experience in that area except some classes at PJIIT (Polish-Japanese Institute of Information Technology) so start was painful… but not for long (as with everything in Python).
So I decided to share my thoughts with everyone who would ever try to use threads in Python programming.
Let’s imagine simple case: you want to check some host if it has opened any port from range 1-1000. We create simple script (use whatever host you like)
import socket
import time
def is_port_open(host, port):
"""
Takes host param as string and port param as int.
Returns true if port is open and false otherwise.
"""
#print "[DEBUG] Checking host %s:%s" % (host, port)
s = socket.socket()
is_open = True
try:
s.connect( (host, port) )
except socket.error:
is_open = False
s.close()
return is_open
def main():
start = time.time()
host = 'some.host'
ports = range(1,1000)
print "[INFO] Will now check host %s for ports between %s and %s" % \
(host, ports[0], ports[-1])
for port in ports:
if is_port_open(host, port):
print "[INFO] Port %s on host %s is open." % (port, host)
print "Elapsed Time: %s" % (time.time() - start)
if __name__ == '__main__':
main()
How long was it? Well, try this then:
import threading
import socket
import time
import Queue
THREAD_NUMBER = 20
def is_port_open(host, port):
"""
Takes host param as string and port param as int.
Returns true if port is open and false otherwise.
"""
#print "[DEBUG] Checking host %s:%s" % (host, port)
s = socket.socket()
is_open = True
try:
s.connect( (host, port) )
except socket.error:
is_open = False
s.close()
return is_open
class PortChecker(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
host, port = self.queue.get()
if is_port_open(host, port):
print "[INFO] Port %s on host %s is open." % (port, host)
self.queue.task_done()
def main():
start = time.time()
host = 'some.host'
ports = range(1,1000)
queue = Queue.Queue()
for i in xrange(THREAD_NUMBER):
pc = PortChecker(queue)
pc.setDaemon(True)
pc.start()
print "[INFO] Will now check host %s for ports between %s and %s" % \
(host, ports[0], ports[-1])
for port in ports:
queue.put( (host, port) )
queue.join()
print "Elapsed Time: %s" % (time.time() - start)
if __name__ == '__main__':
main()
Try with different port range/thread number. Ok, this is just an example. But it should give you an idea how to handle with some process that is done inside a loop. You have to define your “handler” (thread that will process on some object(s)), create empy queue and fill it with objects. Try to use only 1 thread (set THREAD_NUMBER to 1) and look at the time it took to finish whole process.
I know that programmer’s time is much more expensive than processor’s time and I try not to break this rule. But sometimes (i.e. when your script runs for over 12 hours) it is very nice if you can cut the time you can generate results. Just remember to catch exceptions inside threads – it is very important as if exception is risen by thread, it will just die without any notification.
Well, thats it. I will proceed tomorrow with some example with SQLAlchemy. Good night folks.