Perl has been at the root of web development since the beginning: even Amazon is built on Perl. Today, Perl gives you access via CPAN to a set of over 18,000 mature modules on just about anything. There is even an Acme:: namespace reserved for joke modules.

Perl has a lots of benefits for a Drupal developer. First, the syntax of PHP has been greatly influenced by Perl, so most PHP programmers should feel comfortable in Perl. It is easy to install extra Perl modules on any Linux distribution from the command-line using CPAN, or on share hosts using the administration interface. And Perl is faster than PHP, which makes it an excellent candidate for the heavy-lifting part of a website.

Let's build a small perl script to:

  1. Log into a website
  2. Parse a page and search for specific content
  3. Format the content as an RSS feed
  4. Load the feed into Drupal

This solution would be extremely simple to build using only four Perl CPAN modules. Here is how it goes:

STEP 1: The first line in the perl script is the shebang, which points to the location of Perl on your system.

#!/usr/bin/perl -w

On shared hosts, you might have to use something like this to tell Perl to look inside your home directory:

#!/ramdisk/bin/perl -w # # Hostmonster fix BEGIN { my $homedir = ( getpwuid($>) )[7]; my @user_include; foreach my $path (@INC) { if ( -d $homedir . '/perl' . $path ) { push @user_include, $homedir . '/perl' . $path; } } unshift @INC, @user_include; }

STEP 2: Declare the modules you intend to use (these must be installed first):

use CGI::Minimal; use WWW::Mechanize; use XML::RSS; use HTTP::Message;

STEP 3: Define some constants. We'll provide a user agent (here IE8) to make sure the system will not reject us by mistake.

my $login_url = "https://example.com"; my $login_agent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)"; # my $login_form_name = "form_login"; my $login_field_user = "login_id"; my $login_field_pass = "passwd";

STEP 4: Use CGI::Minimal to read the login and password coming from POST or GET:

# Gets us access to the HTTP request data my $cgi = CGI::Minimal->new; # # Get the name and value for each parameter: my $login_user = $cgi->param('user'); my $login_pass = $cgi->param('pass');

STEP5: WWW::Mechanize is our Swiss Army tool, allowing us to post forms, click on buttons, follow links etc. With only six lines of code, WWW::Mechanize can read the login page, find the login form on it, enter the user name ad password, submit the form, and return the next page.

# The autocheck => 1 tells Mechanize to die if any IO fails, so you don't have to manually check. my $mech = WWW::Mechanize->new(autocheck => 1, agent =>$login_agent); # # Fetch the login page $mech->get($login_url); # # Find and select the form by name, returning an HTML::Form object $mech->form_name($login_form_name); # # Fill specific fields on the form $mech->field($login_field_user,$login_user); $mech->field($login_field_pass,$login_pass); # # Click the submit button $mech->click();

STEP6: We could then navigate the site by following links using WWW::Mechanize, but let's say the content we are interested in is on the next page. We want to extract the following information:

Link to post 123

With the help of WWW::Mechanize we can extract all the links which have class "post":

my @links = $agent->find_all_links( tag => 'a', class => 'post', );

STEP7: Now build the RSS result using the XML::RSS:

# Syndication feed my $rss = XML::RSS->new(version => '2.0'); # # Create xml content foreach (@links) { $rss->add_item( title => $_->text, link => $_->url ); }

STEP8: The final steps simply return the result using HTTP::Message:

# Manage the HTTP response my $response = HTTP::Message->new; # # Create message with xml as text $response->header('Content-Type' => 'application/rss+xml'); $response->content($rss->as_string); # # Send message to client print $response->as_string;

STEP9: Finally, in Drupal, download and install FeedAPI and enable FeedAPI, FeedAPI Node and SimplePie Parser (external library required). Then create a Feed node with the URL pointing to your script:

http://localhost/feed.pl?user=foo&pass=bar

That's it! A very simple and strong foundation to build upon. For example, this can be used to perform a search on a site, or return the results in XML by replacing XML::Feed with XML::Generator.

Have fun!

Read Next
Appnovation Blog Default Header

CSS Optimization Tips

17 December, 2009|2 min