Initial Introduction to Perl for C Programmers

zackblair (32)in #programming • 7 years ago (edited)

What is Perl?

Perl is a dynamic, or scripted language that is well-known for its strengths in text-processing. Programs written in Perl are often referred to as Perl scripts, and consist of Perl source-code in a text file.

Perl scripts are executed by the Perl interpreter, which is typically referred to simply as perl (notice the P is not capitalized when referring to the interpreter). While Perl scripts are technically compiled into an intermediate code before being executed by the interpreter, from the user's point-of-view, Perl is essentially an interpreted language.

Probably the three most distinctive characteristics of Perl are its extensive support for text processing, its heavy reliance on context, and its impressive flexibility.

Hello, World!

It's only fitting for any decent programming language tutorial to start with a "Hello, world!" program:

# Greet the world 

print "Hello, world!\n";

Pretty simple, huh? Well, that is in line one of Perl's most commonly-repeated goals: Make easy things easy, and hard things possible.

The first line of our little program is a comment. As in many scripting languages, Perl uses the sharp/pound symbol ("#") to define comments. The second line uses a built-in function in Perl called print to output our message to STDOUT. Unlike C or C++, which strive to provide only as much functionality as necessary in the core language, while leaving the rest to libraries, Perl includes a lot of functionality in its core. The print statement is an example of this.

Variables and Data types

Being a dynamically typed language, data-types in Perl are a much different subject than in C. In Perl, the type of a variable has a lot more to do with how it organizes data it contains than what is considered a type in C. For instance, there is no integer, string, character, or pointer type in Perl. Strings, characters, and numbers are all treated roughly interchangeably because they can be converted from one to the other internally by the Perl interpreter during runtime.

Hence, all of the numeric data types in C, pointers, and even strings, all correspond to a single data-type in Perl: the scalar. Scalar variables hold anything that constitutes a single value, like a number, reference, file handle, or string. Scalar variables are prefixed by a dollar sign:

my $something = "Hello"; 

my $age = 23;

Values that are not single values may be stored in arrays, as in many other languages. Arrays in Perl are another fundamental data type, and they are prefixed by an "@" symbol. Unlike in C or C++, arrays in Perl can grow and shrink dynamically, and the size of array can be queried at run-time simply by using the array in a context where a scalar variable is expected (a scalar context):

my @array = (1, 2, 3, 4); 

my $sizeOfArray = @array; # Stores 4 into $sizeOfArray 

print $array[2]; # Prints '3'

The example above illustrates how arrays can be assigned values using a list literal. Unlike in C++, the size of an array in Perl can shrink or grow as needed, and need not even consist of a contiguous range of indices.

The print statement above illustrates a peculiarity of Perl's syntax, in that the @ symbol prefixed to our @array variable changes to an $ symbol when we are selecting a single scalar value from it. This reflects a general Perl theme where context is important. Here, the $symbol is appropriate because the expression $array[2] as a whole evaluates to a scalar value (i.e. the value of the element at index 2 in the array).

Perl also has Hashes, or associative arrays, which can be used to map a set of key scalar values to another set of scalar values. Hash variables are prefixed by a percent sign:

my %phoneNumbers = ("Bob" => "604 573-1234", "Joe" => "604 888-7788"); 

print $phoneNumbers{'Bob'} . "\n"; # Prints '604 573-1234' 

print $phoneNumbers{'Joe'} . "\n"; # Prints '604 888-7788'

Note again that when we reference a particular element in the hash, we change the "%" symbol in front of "phoneNumbers" to a "$" symbol, indicating that we are accessing a scalar value from the hash.

Dynamic Typing

Unlike C, which is statically typed, Perl is dynamically typed. What this means is that the datatype of any particular scalar variable is not known at compile-time. Moreover, with dynamic typing in Perl, data types are automatically converted to one-another and reassigned as necessary. Hence, the following outputs 'My age is 900':

my $var = 1234; 

$var = "My age is "; 

$var = $var . 900; 

print $var;

Dynamic typing is not all sunshine and happiness, however. Many compile-time checks that a C or Java programmer is accustomed to are impossible in a language that doesn't know the type of variables at compile-time. For instance, if you misspell the name of a method to an object, Perl will only notify you of that error when that particular line is executed. This aspect of dynamic typing means that thorough testing is especially important for programs written in Perl.

Strict and Warnings Pragmas

To ensure that the maximum number of compile-time checks and warnings are reported, you should always use two pragmas near the top of your programs:

use strict; 

use warnings;

The 'use strict' pragma introduces several compile-time checks to your code. The most immediately apparent of these changes is that variables must be explicitly defined. For instance, without 'use strict', the following code is valid and will print "Hello world":

$something = "Hello World"; # Create a variable out of thin air print $something;

However, with 'use strict', the first line must be changed to read

my $something = "Hello world";

or a compile-time error will be reported. In the example above, having the variable '$something' just appear is convenient, in that it saves three characters of typing. However, imagine the slightly different program:

$something = "Hello world"; 

print $something;

In the code above, Perl will create a new variable called '$something' (a misspelling) and store "Hello world" in it. The next line was intended to print "Hello world" but because it refers to a different variable named "$something", it will print nothing. The second variable, "$something" will automatically be created for the print statement and be initialized to an empty string.

Operators

Perl has most of the basic arithmetic and logic operators that C has, plus a few more. For instance, +, -, /, * and = all work in the same way as in C, and combinations like '+=' and '*=' also work the same way. One arithmetic operator that Perl has and C does not is the exponentiation operator, **. Using this operator, it is possible to raise the number $x to the power $y using the expression $x ** $y.

Perl also behaves differently in terms of logic operators. As in C, the number 0 is considered to have a false truth value, and all other numbers are considered to be true. However, in Perl other, non-numeric values also have truth values. In all, the special value undef, the number 0, an empty string "", and an empty list, all constitute false values, and all other values are considered true when used in Boolean expressions. This fact makes possible the very interesting behaviour of Perl's logical or and operators.

Whereas the result of a Boolean expression in C is simply a truth value, the result of a Boolean expression in Perl is the value of the last term evaluated. This sounds confusing, but an example makes it clear.

my $val = 'Ketchup' || 'Katsup';

The example above stores a true value in $val, because both 'Ketchup' and 'Katsup' are true values and so the result of or'ing them together is also a true value. Specifically, however, $val will contain 'Ketchup', because that is the value of the last term evaluated. The second term in the or expression didn't need to be evaluated because the first term was true. Similarly,

my $val = 0 || 'Katsup';

stores Katsup in $val, because 0 is a false value and so the second term had to be evaluated to determine the truth of the expression as a whole.

The logical and operator, &&, works in a similar fashion.

my $val = 0 && 'Katsup';

stores 0 in $val because after evaluating the first term, Perl could already see that the expression was false.

Also, Perl has the operators or and and that are the same as || and &&, respectively, except that they have a lower precendence.

Control Structures

Most of Perl's basic control structures are the same or very similar to those in C. For instance, Perl has if statements, while, do-while, and for loops.

The syntax for a basic if statement in Perl is the same as that in C. However, C's else if clauses are spelt elsif in Perl. For example:

if ($true) { 

	bla(); 

} 

elsif ($something) { 

	foo(); 

} 

else { 

	bar(); 

}

Perl's for loops are also very similar to their equivalents in C:

for (my $i = 0; $i < 10; $i++) { 

	foo(); 

}

as are Perl's while loops:

while ($true) { 

	bar(); 

}

One difference between Perl's loops and C's is that in C, the break and continue operators exit and iterate through the loop, respectively, whereas in Perl those same tasks are performed by the last and next operators.

Following Perl's TIMTOWTDI (There's more than one way to do it) philosophy, Perl offers several variations of these basic structures. For instance, the if statement has a post-fix form that looks like this:

foo() if ($true);

Also, Perl has an unless statement that is equivalent to a negated if statement:

unless ($false) { 

	bar(); 

}

While fun and potentially useful in certain situations, these alternative forms should be used with caution. Programmers scan code with their eyes to gain a preliminary idea of its structure and these alternative forms usually make this task more difficult. One exception that is generally acceptable is using the postfix if operator in conjunction with the next or last operators. For instance,

while (1) { 

	last if ($quit); 

	next if ($something); 

	bar(); 

}

Subroutines

Subroutines in Perl serve the same purpose as functions in C. However, subroutines in Perl are much more flexible. Whereas in C, each function must be declared before its use, in Perl this is not necessary because the availability of subroutines is evaluated during runtime. That is one reason that you can misspell a subroutine name in Perl and it won't be caught until that line of code is actually executed.

Also, subroutines in Perl do not have formal argument lists as functions in C do. Consequently, the most basic syntax for a subroutine is extremely sparse:

sub foo { }

As with a C functions, Perl's subroutines can take parameters and can return a value to their caller. In Perl, however, all parameters to a subroutine are passed as a Perl array called @_. This array exists in all subroutines even though it is not explicitly defined, and its contents are the parameters passed to the subroutine by its caller. For instance, the following code prints "My name is Bob":

sub printName { 

	print("My name is $_[0]"); 

} 



printName('Bob');

Hence, all Perl subroutines are a little bit like C's variadic functions (e.g. printf(...)) in that they can take a varying number of parameters and the types of the parameters are not checked at compile time.

Perl Idioms

One the the gravest mistakes a C programmer can make when learning Perl is to get into the habit of writing Perl programs as if they were C programs. By this, I mean that there is a tendency to only use those features in Perl that are directly analogous to features of C, and thereby forfeit a lot of the benefit of using Perl. Hence, it is important to become accustomed to using common Perl "idioms" -- common constructs or methods used by the Perl community that take advantage of the unique features of Perl.

For instance, it is very common in C to wrap nearly every function call that may return an error code in an if statement. It is entirely possible to do the exact same thing in Perl, but Perl offers a much cleaner syntax for most cases using its or operator. For instance, if foo() returns a value that evaluates to "false" on error, a C programmer might write something like,

if (!foo()) { 

	die "Something bad happened with foo!"; 

}

but it would be more "Perlish" to write,

foo() or die "Something bad happened with foo!";

The second solution also has the benefit of emphasizing the call to foo() rather than its error-handling code. This, in turn, makes your programs more readable.

Another instance where C's influence can often be harmful to Perl programs is in the case of the for loop. C programmers are likely to write loops like,

for (my $i = 0; $i < 10; $i ++) { 

	print "This is iteration $i\n"; 

}

but this form is needlessly verbose and complex for the simple case of iterating over a range. A more Perl-like solution might look like,

foreach (0 .. 9) { 

	print "This is iteration $_\n"; 

}

Functional Programming

Perl provides several deceptively useful features that cater to an interesting paradigm known as "functional programming." While procedural languages like C and Perl have "functions" or "subroutines" that may return a value, as mathematical functions can, functional programming further extends the mathematical metaphor of the function.

In functional programming, the return value of a subroutine is of primary concern, and subroutines themselves can often be modified by other subroutines, in much the same way that taking the derivative of a mathematical function produces a new function. Subroutines that take another subroutine as one of their arguments, or which return a subroutine, are called "higher-order functions".

Perl offers several "higher-order" build-in functions to operate on lists, called map, grep, and sort. Each of these is a "higher-order" function because each takes a "code block" as its first argument. A code block is simply a block of code delimited by curly brackets.

The grep function -- called 'filter' in some other languages like Python -- is used to select elements from a list that match a certain criteria, and return a list of those matching elements. For instance, the following code finds all the even numbers in the array @numbers and stores them in @even.

my @numbers = (1, 2, 5, -10, 3); 

my @even = grep { $_ % 2 == 0 } @numbers;

Whenever the code within the curly-brackets of the grep statement evaluates to true, the corresponding element stored in the variable $_ will be added to the list @even.

Hence, the equivalent code using a foreach loop is:

my @numbers = (1, 2, 5, -10, 3); 

my @even; 

foreach (@numbers) { 

	if ($_ % 2 == 0) { 

		push(@even, $_); 

	} 

}

The map function is somewhat more difficult to understand. Whereas grep "selects" elements from one list and returns them as another list, map performs some calculation on each element of the input list, and returns a separate list containing the results of each calculation. For instance, the following code creates a new list by taking the reciprocal of each element in @numbers.

my @numbers = (1, 2, 5, -10, 3); 

my @reciprocals = map { 1 / $_ } @numbers;

Therefore, the code above is equivalent to the following code using a foreach loop:

my @numbers = (1, 2, 5, -10, 3); 

my @reciprocals; 

foreach (@numbers) { 

	push(@reciprocals, 1 / $_); 

}

Most times you encounter a foreach loop that contains a push statement, you might be able to use grep or map instead.

The sort function sorts the elements of an array, but it allows for you to pass in a code block that determines the sort order. By default, array elements are sorted lexicographically (basically according to their position in the ASCII table), even if all the elements are numbers. This results, sometimes unexpectedly, in the number 10 being sorted before the number 2, because the character "1" comes before "2". To overcome this, you can specify a code block that operates on two variables, $a and $b, and returns a negative number if $a comes before $b, 0 if they are equal, or a positive number of $a comes after $b.

Summary

To a developer who is used to using C or C++, Perl can seem very permissive and a little eccentric. For instance, the fact that Perl offers both the familiar "if (condition) statement;" statement syntax and the alternative post-fix "statement if (condition);" syntax may seem like an unnecessary complication of the language; however, these differences embody the important Perl philosophy of "There's more than one way to do it" (often abbreviated TMTOWTDI).

If you don't find Perl's philosophical choices endearing, I hope that you will at least find Perl to be a useful tool, especially for developing tools that must do text processing or network communication, as these are some of Perl's strengths. Regular expressions in Perl are an invaluable tool for text matching and replacement operations, and several modules on Perls' CPAN webpage can be used to perform almost any task that the base language doesn't directly support.

#perl #c #interpreter