Simple Perl Multi Processing

One of my complaints about programming is that it’s often overcomplicated. Sometimes you need a simple solution with minimal dependencies that gets the job done. This is certainly true of utility scripts designed to run on more than one system or in divergent environments.

I have a section of code that must be run over and over, takes several seconds to run, doesn’t put much load on the system and the overall script run time is too long–a perfect case for multi processing or multi threading. It’s written in perl and the server has plenty of memory and cpu so so I chose multi processing.

Perl multi-processing is implemented on top of UNIX multi processing so the how to reads like a computer science class on UNIX programming. Here’s a working prototype that took a lot of tweaking to develop. Details of problems I encountered are below the code.


#!/usr/bin/perl -w
#
# simple multi processes prototype
# Morgan Jones (mjones7@morganjones.org)
# $Id: multi_proc_proto.pl 125 2008-12-29 21:50:41Z morgan $

use strict;
use POSIX ":sys_wait_h";

my $parallelism = 4;  # number of processes to run simultaneously.
my $pids;  # keep track of PIDs as child processes run

for (my $i=0; $i<100; $i++) {
    my $proc_running = 0;  # flag to indicate a process has been started.

    do {
	my $pidcount = keys %$pids;
	print "pidcount: $pidcount, parallelism: $parallelismn";

	if ($pidcount < $parallelism) {
	    print "starting \"job\" $in";
	    my $pid = fork();

	    if (defined $pid && $pid == 0) {
		# do the meat of the work here..
		my $t;
		do { $t = int (rand(3)); } until ($t>0);
		sleep $t;
		exit 0;
	    } elsif (defined $pid && $pid > 0) {
		# keep track of running processes
		$pids->{$pid} = 1;
		$proc_running++;
	    } else {
		# TODO: count number of failures and exit after too many
		print "problem forking..Sleeping and retrying..n";
		sleep 1;
		next;
	    }
	} else {
	    my $proc_reclaimed = 0;

	    for my $p (keys %$pids) {
		# be careful waiting on $p or substituting'0' for WNOHANG
		my $ret = waitpid(-1, WNOHANG);

		if ($ret<0 || $ret>0) {
		    print "process $ret finished..n";
		    delete $pids->{$ret};
		    $proc_reclaimed = 1;
		}
	    }
	    # only sleep if one or more process didn't finish..  this
	    #   allows us to spin off a new process if one's available
	    #   but keeps us from busy waiting if all the processes
	    #   are busy.

	    unless ($proc_reclaimed) {
		print "sleeping: no processes reclaimed...n";
		sleep 1;
	    }
	}
    } until ($proc_running);
}

My initial version did a waitpid($p, WNOHANG), I also tried waitpid(-1, 0). In both cases it caused the load on the system to go up very quickly, eventually locking the system completely. Searching out result sets less of a few hundred doesn't exhibit the problem.

Running it through the debugger showed the child process finishing. strace showed the process completing. ps would show the process still running. The only symptom was a high sys cpu utilization.

This entry was posted in Perl, Programming on by .

About morgan

Morgan is a freelance IT consultant living in Philadelphia. He lives with his girlfriend in an old house in Fishtown that they may never finish renovating. His focus is enterprise Messaging (think email) and Directory. Many of his customers are education, school districts and Universities. He also gets involved with most aspects of enterprise Linux and UNIX (mostly Solaris) administration, Perl, hopefully Ruby, PHP, some Java and C programming. He holds a romantic attachment to software development though he spends most of his time making software work rather than making software. He rides motorcycles both on and off the track, reads literature with vague thoughts of giving up IT to teach English literature.

Leave a Reply

Your email address will not be published. Required fields are marked *