Perl
From Linux 101, The beginner's guide to all things Linux.
Perl is a programming language invented by Larry Wall. In the Linux world, it is most frequently used as a scripting language. Other uses include providing an architecture for plugins, in applications such as Gaim and IRSSI (two networked communications clients). Also, the page rendering code for Slashdot, otherwise known as Slashcode, is written entirely in Perl.
Perl is made from a combination of elements of other languages. It could best be described as a combination of C and shell scripting language, with thousands of perl modules thrown in. These modules provide extra functions that can be used by scripts to do anything from changing the time to extracting artist/song information from Ogg Vorbis streams.
Learning Perl can be a daunting task due to its complexity, but there are many resources available to assist with this. It is possible to get started by simply examining the Perl scripts of other people, which is how this author learned.
See also: Wikipedia Article: Perl
Text Below Contributed by Robert Long (Easy-Linux.com)
[edit] Brief howto on writing good PERL
[edit] Why Programmers tend to Love or Hate PERL.
There are many reasons for this. My favorite came from a colleague who said:
"I don't like PERL because there is no 'one way' to do anything.
This is very true and is the main reason people seem to dislike the language. Not based on its power, flexibility, ease of maintanence, or syntactic restrictions, but based on the fact that they came across some monkey fisted code.
Good style is very important in PERL and consistency coupled with documentation go a LONG way towards this end.
[edit] Gaining Style Points
This section is intended to be somewhat of a 'heads up' to newer PERL coders. There are style differences every where you go. If you go somewhere with an existing code base it is expected that you will adhere to current practice. If you are writing code on your own may I suggest the following simple rules (ripped directly from Larry Wall himself):
I'm not saying I have strictly adhered to these in all of my code, but the mark of any good professional is the ability to change and grow with time and experience.
- 4-column indent.
- Opening curly on same line as keyword, if possible, otherwise line up.
- Space before the opening curly of a multi-line BLOCK.
ie. if ($statement == 1) {
#there are 4 spaces under the if!
#the openning brace/curly is online with the if() statement,
#there is a space between the ) and the { for clarity
#the bottom curly matches the 'i' in if, so you know they match.
}
- One-line BLOCK may be put on one line, including curlies.
ie. foreach my $i (@array) { print "entry: $i\n"; }
- Uncuddled elses.
| /* cuddled "else" */ | /* uncuddled "else" */
| if (x > 0) { | if (x > 0) {
| x += y; | x += y;
| } else { | }
| y += x; | else {
| } | y +=x;
| }
- No space before the semicolon.
| /* space before semicolon */ | /* no space */ | x += y ; | x += y;
- Space around most operators.
| /* no space around operators */ | /* with space */ | x+=y; | x += y;
- Space after each comma.
ie. function_call($arg1, $arg2, $arg3) is better than function_call($arg1,$arg2,$arg3)
- Line up corresponding items vertically.
my $variable_1 = 'a'; my $variable_name_2 = "String Of Chars"; my $numeric_value = 4;
· Just because you CAN do something a particular way doesn’t mean that you SHOULD do it that way. Perl is designed to give you several ways to do anything, so consider picking the most readable one. For instance:
open(FOO,$foo) ││ die "Can’t open $foo: $!";
is better than
die "Can’t open $foo: $!" unless open(FOO,$foo);
because the second way hides the main point of the statement in a modifier.
[edit] Summary
PERL is full of these ambiguities since it is a coalescence of different languages. Each of its component languages supported similar datatypes with slightly different syntax. To a large degree they are each supported. This is the source of most confusion in PERL.
The point is, whatever style you adopt, be consistent so that you use variables in the same way all the time. This will make life easier when you revisit the code to make changes. A common failing of programmers is the assumption that their code is complete or finished. Code is NEVER complete nor is it EVER finished. You or someone else WILL see the code again and the quality of the code will reflect on you. Take pride in your code because it will be around later to haunt you.
[edit] Interesting DataTypes and Variables:
[edit] Arrays:
Every language that does anything more complicated than simple job control or input/output direction handles arrays. Arrays are sets of values bound together under variable name. An array can hold integers, strings, characters, floats, memory locations, instructions, etc etc.
[edit] Scalar Arrays:
PERL has two (2) basic types of arrays. The Scalar array is the traditional array most people think of when they consider any language. Given a variable name, the first value is located at the memory location of the variable name. Each value beyond the first one is located at sizeof(value)*location + base address.
* Scalar: traditional array, base + offset.
@array = ('value1', 'value2', 'value3');
print "$array[0]"; # prints value1
[edit] Associative Arrays
The other family of arrays is the associative array. I say family because the associative array is really a front end to any number of data stores. Some people have even written handlers to extend PERL so that you can use ODBC databases behind these mighty labels.
Associative arrays store key / value pairs. You put in a key and it returns a value. So if you declare
my %array;
Then you store values based on keys. A key is usually input as a string value. Due to the way PERL lets you handle associative arrays, it is best not to expect integer values to return consistent results. That is a bit out of scope for this guide however.
Let's say we want to store the color of our cars in an array based on the year and model of the car. The key will be year + model so my Torquoise 1994 Mustang would be 94mustang for a key and Torquoise for a value, as such:
$array{"94mustang"} = "Torquoise";
$array{"01spider"} = "Red";
$array{"99Silverado"} = "White";
$array{"04BMW330"} = "Silver"; # yes I am making these up :)
Now when I want to access the colors of my cars I would ask the array the color based on the car. Ie, feed it the key and expect the value:
print "$array{\"94mustang\"}\n";
Output: Torquoise
Notice I had to escape the quote \" characters in my print statement. Since we are already inside of a quoted string, if we don't escape the internal quotes the string looks like this to the interpreter:
print "$array{"94mustang"}\n";
Interpreter Voice: Someone called the print function with input of "$array{", 94mustang, "}\n"
PERL then looks for a function named 94mustang in the current scope because it obviosly isn't a variable since it didn't start with $, %, or @. Since 94mustang is just a string value now it doesn't call the $array{""} relationship properly either since it doesn't have the key or the end bracket.
Now that you've seen the usage of the associative array we get to the really cool part. The associative array in PERL uses a hash function to maintian key/value relationships. If you've been through any proper data structures course you've had to implement a hash function of some sort. A hash takes a space in memory and randomly assigns your values into them based on a function that is run against the key value. If this confuses you, a hash just takes a key and files the value somewhere in memory under that key's name.
That is the default behavior of the associative array. You can also use any number of other datastructures instead of a hash such as B-Tree, B* Tree, Berkley DB, etc. Using these datastructures generally means you are writing to a file and that you are using the tie function to draw the relationship from the array to the file interaction.
Check out some of the PERL references in the References section for more information as that would be a little too much for this quick guide.
[edit] Handy Built-in / Special Variables
PERL has a number of variables that are always there and have specific uses and intentions. I want to show you a couple of the more often used ones so that if you see them in someone's code you may be able to read it.
These variables may or may not be named when they are used. This is a HUGE opportunity to obfuscate your code beyond the debugging skills of your buddies who you may wish to ask help from to fix your code.
[edit] The @ARGV Special Array
You see it used all over the place what does it do? It is an array containing the arguments passed on the command line to the script.
perl test.pl argument_1 argument_2 argument_3
@ARGV contains ("argument_1", "argument_2", "argument_3"), and the sizeof(@ARGV) is 3. To test whether or not your script received arguments you can test the size of @ARGV.
if (@ARGV == 0) {
die "\n Usage: $0 <filename> <io mode> <username>\n";
}
This will inform the user that test.pl needs 3 arguments and gives them hints as to what those arguments should represent. A usage statement would generally be more generously fleshed out than this anemic example.
[edit] The $0 Special Variable
Oh this is a handy bugger. In other languages, $ARGV[0] would be the name of the piece of code. In PERL that is the first argument passed on the commandline to the program. If you need the name of the script for let's say, a usage statement, then you get it from the $0 special variable.
[edit] The %ENV / $ENV{expr} Special Hash
Literally, your environment. You have full access to your environmental variables through this hash. Change a value here and any child processes will get the updated environment. Ie, change or insert a few environment variables for the child process, execute a fork(), then change the variables back. You have successfully communicated with a child process to set its goal in life.
[edit] The $_ Special Variable
The $_ is the "default input and pattern-searching space" (courtesy of man perlvar).
When you have a file open, and you are reading from it, $_ contains the current line of the file. That is the MOST common use of the $_ special variable.
open(INPUT,"<filename");
while (<INPUT>) { # <INPUT> is true as long as we have not reached end-of-file
print;
}
close(INPUT);
Is the same as running the cat command on filename. You will print every line of the file. Even though you did not specify a variable to print, PERL said... hey, print doesn't have an argument, let's give it the DEFAULT arguement, or $_;
print; is the same as print "$_";
You may also see this used in array and tree traversals. Programmers often need to apply a transform to the values in an array so they create some regular expression(s) to do the job then feed the array through that "filter". Let's say we are generating a form and want to generate a personalized copy for each user in our user_array.
@lines = ("Welcome USER", "Your email address is USER\@somewhere.net", "Thanks again USER");
@users = ("Robert", "Mary", "Chris", "Jenny", "John", "Jill", "Marty", "Ashley");
foreach (@users) {
local $user = $_;
foreach (@lines) {
s/USER/$user/;
print;
print "\n";
s/$user/USER/;
}
}
Now in the first foreach, you assign the $_ special variable to $user and you make it local to this scope so you don't overwrite anybody else's $user variable that may be in the same scope (ie a global variable you just don't happen to know about because some other coder in your team isn't using strict.
In the second foreach, the regex operates on the $_ variable that is actually the entry in the array. It does not make a copy, so once we print the value, we have to reverse the change so that it still works for the next user.
The $_ special variable is overwritten by the second foreach to be the current entry in @lines, instead of the current entry in @users. This is important to note that the special variables may get changed at places you don't expect. What if you were reading the users in from a file one line at a time? If you hadn't assigned the $_ to a named variable, once you stepped into the array @lines you suddenly lost access to the username stored in $_. Once you exit the foreach (@lines) loop however, the value is reset to its previous state. (It actually works a little differently in the symbol tables, but that covers the behavior as the other conversation is out of scope here).
[edit] The @_ Special Variable
Whenever a subroutine / function is called, the argument list is stored in the @_ special variable.
You will often times see a subroutine that starts out as such:
# another_sub( $firstname, $lastname, $date, $age, $height )
sub another_sub {
my $firstname = shift @_;
my $lastname = shift @_;
my $date = shift @_;
my $age = shift @_;
my $height = shift @_;
# Do some stuff and be done!
}
The shift function pulls the first entry from the array and returns it. This would be equivalent
to stating:
my $firstname = @_[0]; my $lastname = @_[1]; my $date = @_[2]; my $age = @_[3]; my $height = @_[4];
or even shorter:
my ($firstname, $lastname, $date, $age, $height) = @_;
The shift is the easiest because as you change the signature of the subroutine, you can move the order of the variables around much easier by just cutting and pasting lines, or inserting lines where appropriate.
[edit] The $. Special Variable
Courtesy of man perlvar:
Current line number for the last filehandle accessed.
Each filehandle in Perl counts the number of lines that have been read from it. When a line is read from a filehandle (via readline() or "<>"), or when tell() or seek() is called on it, $. becomes an alias to the line counter for that filehandle.
To clear up what the man page is saying, the $. is just a read-only copy of a value from the most current (logically closest) file handle. Chaning the $. has nothing to do with the actual value of a file pointer, nor will locallizing it change the behavior of the filehandle. However, as the file pointer moves down, the $. special variable gets updated for you behind the scenes.
[edit] The $! Special Variable
The numeric return value of a system or library call. If you try to execute a system call and it does not succeed, you can access this variable for an exit code. This is very useful for debugging and logging, ie:
open(INPUT, "<FILENAME") || die "Could not open file, errno $!\n";
[edit] The $$ Special Variable
The current process id of perl that is executing the script you use this variable in. This is very useful if you want to terminate yourself, or if you need this information for IPC, random number seeding, or reporting.
[edit] The $< and $> Special Variables
These two gems give you the real and effective user id's respectively. If write a piece of code that runs setuid root, but you would like to log who ran it, then use $< to output their uid. If you want to check to see if the program is setuid root when you don't think it should be, check the $> value to see if it is 0.
Handy little features that keep you from having to call external programs to garner this data.
[edit] The $( and $) Special Variables
These function just like their cousins $< and $> except for group ids (gid). $( is real group id and $) is effective group id.
[edit] Regular Expressions:
PERL has a powerful regular expression engine that is most commonly used for matching and substitutions in strings.
Let's make this a little easier than usual!
Two common uses are matching and substitution:
[edit] Matching
matching: $variable =~ m/value/
this statement returns TRUE if the string value is found anywhere in $variable. there are modifiers to specify where in the string to match, these are ^ for front, and $ for end.
matching: $variable =~ m/^value/;
will be true for "value is here", but false for "here is the value". on the other hand $variable =~ m/value$/; will be true for "here is the value" but not for "value is here".
[edit] Substitution / Replacement
Substitution is one of the more powerful uses of the PERL regex feature.
$variable =~ s/value/entry/g;
the 's' at the beginning indicates substitution mode (we are replacing text) s///; 'value' is what we want to find and replace, 'entry' is what we want to replace value with, the trailing 'g' means we want to be greedy, ie do it as many times as we can!
$variable = "I have values on a value for everything I like!";
$variable =~ s/value/entry/g;
$variable now contains: "I have entrys on a entry for everything I like!";
Regex is a powerful feature but you will quickly learn to document the expressions so that six months later you can still decifer them. Different languages handle the meta expressions for regex differently so leave yourself hints as much as you can.
[edit] Handy Built-in Functions:
The functions that are native to PERL are extensive. The best quick reference you'll find is to use man perlfunc from the command line.
A few of my favorite functions have to be:
[edit] my $variable / my ($var1, $var2, ... )
This elegant little function has one purpose. Localizing scope. You know that $variable belongs to you and does not exist once you exit this scope. This protects you and anyone who uses your code.
We all know from day one of programming that global variables are BAD. It makes debugging hard, it makes tracing difficult, and it confuses value origins. Along with those problems it clouds the namespace and makes you use all sorts of conventions like $____variable___name____global = 3; so that you don't use a name someone may need later.
For instance:
my $pizza = "pepperoni";
if ($pizza eq "pepperoni") {
# we're gonna mess with the delivery guy here:
my $pizza = "sausage";
print "$pizza\n";
}
print "$pizza \n";
You will get the output of:
sausage pepperoni
Because of the locallized scope. The second my $pizza created a new variable with the same name but a different location in memory so that you did not affect the earlier version AT ALL!
That is why I like 'my()'. You will not destroy, alter, or molest anyone else's variables outside of what you intend to do.
[edit] split(/pattern/, $variable, $count);
@array = split(/pattern/, $variable, count);
The split(/pattern/, $variable) or split(/pattern/, $variable, count) function takes the data contained in $variable, and breaks it into chunks based on the delimiter specified as /pattern/, and returns to you as many results as it can until it reaches count if specified.
Lets say you have a flat file database. The records are delimited with the ':' character (ie the UNIX password file).
open(INPUT, "</etc/passwd");
while(<INPUT>) {
my ($username, $password, $uid, $gid, $geckos, $homedir, $shell) = split(/:/, $_, 7);
print "$username has home directory of $homedir\n";
}
This would print a line for every user in the password file. The format of the passwd file is as such:
root:x:0:0:root:/root:/bin/bash
Notice there are 7 fields, all seperated by the ':' character. The first field is the username for the system. ( root ) The second field is 'x' in this case because our passwords are stored in a shadow file. Before shadowing, the encrypted password occupied that field. ( x ) The third field is the numeric user id (uid) ( 0 in this case because we are root ) The fourth field is the numeric group id (gid) ( 0 in this case because we are root ) The fifth field is the geckos field (root, usually the user's full name, phone, etc) The sixth field is the user's home directory ( /root, usually /home/<username> ) The seventh field is the shell to execute when the user logs in. ( /bin/bash )
Our split function breaks the entry on the ':' into the 7 fields we defined, then we can access each variable to use its contents.
Another use of split is where you don't know how many fields you have. Let's say you are taking a text field with a list of email addresses and sending them all an email message. You have a subroutine (function) somewhere called mailto($email, $subject, $body) that sends the email address $email, a message with Subject: $subject and body of $body. The email addresses are comma delimited and we will trim whitespace off of them using Regex.
@array = split(/,/, $email_field); # no count since we want them ALL
# the local $email variable is like for($i=0; $i < scalar(@array); $i++) { $array[$i]; }
foreach my $email (@array) {
$email =~ s/\s//g; # delete any whitespace you see!
mailto($email, "Hey guess what!", "This is my email spam to you!");
}
Notice that we did not specify a count to the split function? Also notice the regex, we set the s first, for substitution mode, then the pattern is \s which means any whitespace, the replacement value is nothing, and g means do it as many times as you can. So basically, replace any whitespace with nothing, or delete all whitespace characters you find. Then we just iterated through the array. We don't need to know or care how many entries were passed to split because the array is dynamically sized and the foreach loop walks it all.
[edit] chomp ($variable)
PERL has an interesting take on file I/O. When you read a line from a file, you get the entire line, <CR> and all. That means that at the end of each line is that '\n' character. You may not want to store this in a database or use this in your output.
Our friend chomp steps in to save the day. chomp only removed end of line characters from the variable specified. It is very safe. End of line is <CR> in UNIX and <CR><LF> in DOS. Yes, in DOS you get two end of lines, one to move the cursor back to the left (Carriage Return, and one to drop to the next line, Line Feed). It is a hold over from those things we called type writers. Imagine a printer you had to work by hand... never mind, just be happy. :)
open(INPUT, "<filename");
while(<INPUT>) {
chomp $_; # $_ is a special variable for "current line of file"
@array = split(/ /, $_);
foreach $word (@array) {
spellcheck($word);
}
}
For that code, spellcheck is a function you wrote that checks each word against a dictionary and tries to replace it with a word it thinks is spelled properly if it doesn't match an existing entry.
What the code does is open a file for reading (hence the < filename, the < specifies read-only), scroll through each line, chop off the end-of-line character, break up the line on spaces into words, the run each word through the spell checker.
[edit] push(@array, @variables) / push(@array, $variable)
The push function lets you put the array @variables right onto the end of the array @array, or you can put $variable onto the end of @array. This means you don't have to keep count of how large @array is, you don't have to compute offsets, none of the standard hassles of managing arrays in other languages.
[edit] pop(@array)
Again, pop(@array) lets you get the last entry on the array without knowing anything about the array other than it's name.
$variable = 'some string'; push(@array, $variable); $var2 = pop(@array); print "$var2 \n";
Output: some string
Easy array management built right in.
[edit] Resources and Learning Material
[edit] Man Pages
Like anything else in the UNIX world, the man facility is very well fleshed out. If you type in man perl, you will be shown a host of other man topics. The following is most of those, but there are others that deal with changelogs, books, and things like Traditional Chinese support.
Overview
perl Perl overview (this section) perlintro Perl introduction for beginners perltoc Perl documentation table of contents
Tutorials
perlreftut Perl references short introduction perldsc Perl data structures intro perllol Perl data structures: arrays of arrays
perlrequick Perl regular expressions quick start perlretut Perl regular expressions tutorial
perlboot Perl OO tutorial for beginners perltoot Perl OO tutorial, part 1 perltooc Perl OO tutorial, part 2 perlbot Perl OO tricks and examples perlstyle Perl style guide
perlcheat Perl cheat sheet perltrap Perl traps for the unwary perldebtut Perl debugging tutorial
Reference Manual
perlsyn Perl syntax
perldata Perl data structures
perlop Perl operators and precedence
perlsub Perl subroutines
perlfunc Perl built-in functions
perlopentut Perl open() tutorial
perlpacktut Perl pack() and unpack() tutorial
perlpod Perl plain old documentation
perlpodspec Perl plain old documentation format specification
perlrun Perl execution and options
perldiag Perl diagnostic messages
perllexwarn Perl warnings and their control
perldebug Perl debugging
perlvar Perl predefined variables
perlre Perl regular expressions, the rest of the story
perlreref Perl regular expressions quick reference
perlref Perl references, the rest of the story
perlform Perl formats
perlobj Perl objects
perltie Perl objects hidden behind simple variables
perldbmfilter Perl DBM filters
perlipc Perl interprocess communication perlfork Perl fork() information perlnumber Perl number semantics
perlthrtut Perl threads tutorial
perlport Perl portability guide perlsec Perl security
[edit] A Few Good Books
My handbook is a book I found while in college, and I *think* the current version is
Perl5 Howto (amazon link)
O'Reilly puts out great texts on pretty much every technical topic, so check out these:
There are many books on PERL since it is a widely used language. Stroll into any Walden, Barnes&Nobles, or other book store and thumb through a few. Look for code examples and problem explanations.
External Links

