Scraping a website into Drupal using Perl

4 Comments

Perl has been at the root of web development since the beginning: even Amazon is built on Perl. Today, Perl gives you access via CPAN to a set of over 18,000 mature modules on just about anything. There is even an Acme:: namespace reserved for joke modules.

Perl has a lots of benefits for a Drupal developer. First, the syntax of PHP has been greatly influenced by Perl, so most PHP programmers should feel comfortable in Perl. It is easy to install extra Perl modules on any Linux distribution from the command-line using CPAN, or on share hosts using the administration interface. And Perl is faster than PHP, which makes it an excellent candidate for the heavy-lifting part of a website.

Let's build a small perl script to:

  1. Log into a website
  2. Parse a page and search for specific content
  3. Format the content as an RSS feed
  4. Load the feed into Drupal

This solution would be extremely simple to build using only four Perl CPAN modules. Here is how it goes: