Category Archives: Perl

Transparently handle first argument to Perl Package subroutines

It’s always baffled me that Perl subroutines behave differently when called from inside the Package vs. outside.

The use case may be obvious but I’ll say it: you write a small utility routine that you want to call as $pack->func(1); and from within the Package as func(1);;

The problem of course is that when called as $pack->func(1); the first arg will the object itself and when called as func(1); from within the package the first arg will be “1.”

There’s a very simple solution:
shift if ((ref $_[0]) eq __PACKAGE__);

Here’s a simple example:


#!/usr/bin/perl -w
#
package Pack;

sub new {
    my $c = shift;

    my $self = {};
    bless $self, $c;
    return $self;
}

sub func {
    shift if ((ref $_[0]) eq __PACKAGE__);
    my $a = shift;

    print "passed in: $a\n";
}

sub call_func {
    print "calling func from inside ", __PACKAGE__, ":\n";
    func(2);
}
1;

my $p = new Pack;

print "calling func from main:\n";
$p->func(1);

$p->call_func();

Try commenting the “shift” line in sub func() and see how it behaves differently.

How to locate perl modules in the same directory as the script

Perl of course allows you to identify the location of your perl modules by modifying @INC but is surprisingly rigid when it comes to placing module(s) in the same directory with the script.

This is fine for permanent installations or for scripts that depend on an installer but if you want to distribute a script plus module(s) for general use or just want flexibility to copy it from system to system without modifying the code the solution is relatively simple:

Setup: you write a perl module MyModule.pm that you’d like to include in the same directory as your script myScript.pl and allow the script to be called from anywhere.

in myScript.pl you would:


use MyModule.pm;

To contain your script and its supporting files in an arbitrarily located myScript directory you’d have to change into the myScript directory before executing myScript.pl:


$ mv myScript.pl myModule.pm /path/to/myScript
$ cd /path/to/myScript && ./myScript.pl

suppose you just want to execute it as


$ /path/to/myScript/myScript.pl

You’d get:


Can't locate MyModule.pm in @INC (@INC contains: /usr/lib..

Well the solution is fairly straightforward: parse the content of $0 to identify the location of myScript.pl (ostensibly MyModule.pm) and put it at the beginning of @INC:


BEGIN {
    my $script_dir = $0;
    if ($0 =~ /\/[^\/]+$/) {
        $script_dir =~ s/\/[^\/]+\/*\s*$//;
        unshift @INC, $script_dir;
    }
}

Simple Perl Multi Processing

One of my complaints about programming is that it’s often overcomplicated. Sometimes you need a simple solution with minimal dependencies that gets the job done. This is certainly true of utility scripts designed to run on more than one system or in divergent environments.

I have a section of code that must be run over and over, takes several seconds to run, doesn’t put much load on the system and the overall script run time is too long–a perfect case for multi processing or multi threading. It’s written in perl and the server has plenty of memory and cpu so so I chose multi processing.

Perl multi-processing is implemented on top of UNIX multi processing so the how to reads like a computer science class on UNIX programming. Here’s a working prototype that took a lot of tweaking to develop. Details of problems I encountered are below the code.


#!/usr/bin/perl -w
#
# simple multi processes prototype
# Morgan Jones (mjones7@morganjones.org)
# $Id: multi_proc_proto.pl 125 2008-12-29 21:50:41Z morgan $

use strict;
use POSIX ":sys_wait_h";

my $parallelism = 4;  # number of processes to run simultaneously.
my $pids;  # keep track of PIDs as child processes run

for (my $i=0; $i<100; $i++) {
    my $proc_running = 0;  # flag to indicate a process has been started.

    do {
	my $pidcount = keys %$pids;
	print "pidcount: $pidcount, parallelism: $parallelismn";

	if ($pidcount < $parallelism) {
	    print "starting \"job\" $in";
	    my $pid = fork();

	    if (defined $pid && $pid == 0) {
		# do the meat of the work here..
		my $t;
		do { $t = int (rand(3)); } until ($t>0);
		sleep $t;
		exit 0;
	    } elsif (defined $pid && $pid > 0) {
		# keep track of running processes
		$pids->{$pid} = 1;
		$proc_running++;
	    } else {
		# TODO: count number of failures and exit after too many
		print "problem forking..Sleeping and retrying..n";
		sleep 1;
		next;
	    }
	} else {
	    my $proc_reclaimed = 0;

	    for my $p (keys %$pids) {
		# be careful waiting on $p or substituting'0' for WNOHANG
		my $ret = waitpid(-1, WNOHANG);

		if ($ret<0 || $ret>0) {
		    print "process $ret finished..n";
		    delete $pids->{$ret};
		    $proc_reclaimed = 1;
		}
	    }
	    # only sleep if one or more process didn't finish..  this
	    #   allows us to spin off a new process if one's available
	    #   but keeps us from busy waiting if all the processes
	    #   are busy.

	    unless ($proc_reclaimed) {
		print "sleeping: no processes reclaimed...n";
		sleep 1;
	    }
	}
    } until ($proc_running);
}

My initial version did a waitpid($p, WNOHANG), I also tried waitpid(-1, 0). In both cases it caused the load on the system to go up very quickly, eventually locking the system completely. Searching out result sets less of a few hundred doesn't exhibit the problem.

Running it through the debugger showed the child process finishing. strace showed the process completing. ps would show the process still running. The only symptom was a high sys cpu utilization.

Perl Truth

Every time I write a perl script I have to pause to remember how Perl handles true and false. It seems Nathan Torkington answered the question years ago in this article from the Perl Journal. Truth is relatively simple when put in his terms:

– only scalars can be true/false (ie no lists)
– undef is false
– “” is false
– 0 is false
– 0.0 is false
– “0” is false
– all else is true

This means negative numbers are true.