Scraping a website into Drupal using Perl

4 Comments

Perl has been at the root of web development since the beginning: even Amazon is built on Perl. Today, Perl gives you access via CPAN to a set of over 18,000 mature modules on just about anything. There is even an Acme:: namespace reserved for joke modules.

Perl has a lots of benefits for a Drupal developer. First, the syntax of PHP has been greatly influenced by Perl, so most PHP programmers should feel comfortable in Perl. It is easy to install extra Perl modules on any Linux distribution from the command-line using CPAN, or on share hosts using the administration interface. And Perl is faster than PHP, which makes it an excellent candidate for the heavy-lifting part of a website.

Let's build a small perl script to:

  1. Log into a website
  2. Parse a page and search for specific content
  3. Format the content as an RSS feed
  4. Load the feed into Drupal

This solution would be extremely simple to build using only four Perl CPAN modules. Here is how it goes:

Drupal Theming RSS Feed Views

7 Comments

A lot of RSS feeds are very plain and boring, which sometimes results to unsubscriptions or plain out ignoring your feed. Web designers should spend some time to make these more readable and more useful to bring in more traffic.

A few reasons why we should include images in RSS feeds is that it draws in users and also gives a visual representation of content. Think of it as a blog landing page. Another reason is that a few sites use aggregators that pull your RSS feeds so that will lead to more traffic.

I had this problem for the longest time of theming and adding images to a RSS feed View. I always got the default feed of just the title and some teaser text. It was too plain, no dates, and no images even when I added fields to the view. Googling for sample code and solutions was not much of any help, so here my purposed solution for all you readers!