Entity view (Content)

Backend scripting for Drupal integration/backend automation

By bmajerski
Jul. 12, 2013

Drupal development is very flexible in building quick user interfaces but what happens when you need to import large amounts of data into it? Consider import of a few thousand items. This could be content or something as simple as Taxonomy terms. Let's take a quick look at importing some data into Drupal Taxonomy from a 3rd party system. The data needs to being retrieved via an API and needs to be synchronized periodically to Drupal. Thousands of terms cannot be entered or reconciled by hand so this is a perfect situation where a backend solution is needed. Typically such functionality would be placed in a Drupal hook_cron() but development of such a system directly in a hook is frankly cumbersome and hard to test properly. Obviously we want to reuse existing Drupal code here but the primary goal is to make the process of developing and testing it easier (as in outside the Apache process) without compromising the ability to integrate it later into Drupal hook_cron().

So to summarize:

  • thousands of terms needing an initial import to Drupal
  • data extraction from 3rd party system is done via an api
  • periodic synchronization is needed
  • existing Drupal modules must be reused but without having to use the UI
  • test suite and command line debugging tools are needed
  • easy integration into hook_cron()


    Problems one might encounter during such a task is that Drupal is quite large and has many dependency modules. Care must be taken to load what you need, otherwise things may mysteriously not work without any hint of errors or exceptions. Furthermore, you only need to explicitly include the required Drupal modules if you're running your class via command line. When your system is integrated directly into Drupal later, all the required Drupal code will by default already be loaded. The short bit of code can easily be removed later if desired.

    Here is an example code that loads the necessary Drupal modules to get Taxonomy reads and writes which allows you to write some handy command line tools.

    class MyExampleClass {
      /* ... all your methods go here ... */
      public function run_something() {
      /* ... your code that uses the 3rd party API and Drupal internals such as
      * taxonomy_get_tree(), taxonomy_term_save() etc
    } // end class
    /* make sure we are running via command line before including any Drupal bits */
    if (php_sapi_name() == 'cli') {
      /* gets rid of Notices where Drupal depends on _SERVER */
      $_SERVER['HTTP_HOST'] = "localhost";
      $_SERVER['REMOTE_ADDR'] = "";
      require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
      require_once DRUPAL_ROOT . '/includes/common.inc';
      require_once DRUPAL_ROOT . '/includes/file.inc';
      require_once DRUPAL_ROOT . '/includes/module.inc';
      drupal_load('module', 'system');
      drupal_load('module', 'taxonomy');
      drupal_load('module', 'inc');
      drupal_load('module', 'field');
      drupal_load('module', 'field.crud');
      drupal_load('module', 'field_sql_storage');
      drupal_load('module', 'options');
     /* if running THIS script, instantiate your class and run your code */
      if (preg_match('/'. basename(__FILE__) .'/', $argv[0])) {
        try {
          $me = new MyExampleClass();
        } catch (Exception $e) {
          print $e ."\n";
      } // end if this script was run
    } // end if using command line

    This approach gives us the most flexibility in terms of testing, automation and integration options. Drupal developers can write a set of unit tests for this class or classes. The class can easily be used stand-alone or included in as part of another script. It can be run via standard Unix cron or, if preferred, it can be instantiated and integrated directly into a hook_cron() for your Drupal module.

    Hope you enjoyed this post. If I receive a lot of likes and retweets I might get ice-cream :)


Post Tags: