Problems faced when Drupal manages files
Let's say you have a site with 5 million authenticated users and an average of 100 concurrent users. Well, I’d say you're going to run into issues when Drupal starts trying to serve all that private content. When content is stored in the private file system, Drupal initiates hook_file_download to check if the file can be downloaded or not. This is one more PHP call that requires resources to implement this check. If you had a view listing 12 private resources, you would make an extra call for each of those resources. With 100 concurrent authenticated users, you can’t really afford to consume resources like this. As well, Drupal development will become a bottleneck since all requests will have to be processed and served by Drupal. Then you have the problem of actual hard drive space. In this use case, knowing how much data storage is needed is an ever-consuming task, requiring someone to watch the growth and keep stacking hard drives on it to handle the amount of storage.
The most common solution in cases like this is to throw a CDN at the site and have the CDN handle all of this. However, when you want your content to remain private, this solution does not work since a CDN does not restrict access to content. Enter Google Cloud Storage.
What is Google Cloud Storage?
Wikipedia answers this question like so: “Google Cloud Storage is a RESTful online file storage web service for storing and accessing your data on Google's infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing capabilities. It is an Infrastructure as a Service (IaaS), comparable to Amazon S3 online storage service.”
What does this mean to us? Well, Google’s infrastructure provides high capacity and availability. A single file can be up to 5TB in size and the entire system has an uptime of greater than or equal to 99.9% (SLA). It's also a worldwide network that provides edge caching across the globe giving distributed uploads, deletes and retrievals. Limitless capacity is a huge benefit of this infrastructure, so no need to check how much space you have left. The data consistency is instant, meaning if a user deletes an object, no user will be able to access that object immediately after the operation is performed. OAuth 2.0 is enabled here as well to ensure authentication and authorization to interact with the storage API, ensuring data security. Last but not least is the ability to provide signed URLs to access the content, meaning our data stay private and are only accessible to users we choose. There are many other benefits to using Google Cloud Storage; I only listed the ones relevant to this blog. Feel free to check out more here.
Next up, osCaddie Google Cloud Storage module
So how do we connect Google Cloud Storage to our Drupal site? Luckily, we’ve created a module called osCaddie Google Cloud Storage. This module provides the ability for any images, videos or documents that are uploaded into Drupal to still be managed by Drupal (file_managed table) but, instead of being served from local storage, the actual files will be uploaded and delivered by Google Cloud Storage. Sweet! right? I thought so too. So, let's get it setup.
To use this module, you’ll need a few things first, so let's ensure you have each of the following:
- A Google Cloud Storage account: This is where your content will be hosted;
- OpenSSL PHP module: OpenSSL PHP module is required to create .pem file from the p12 key provided by Google Cloud Storage;
- Google PHP API Library: The PHP API library that allows osCaddie module to interact with Google Cloud Storage;
- Libraries API: Drupal module to support third party libraries, in this case, Google PHP API library;
- X Autoload: Provides autoloading of 3rd party libraries;
- Drupal private file system configured: This is used to store the .pem file and p12 key.
osCaddie GCS Configuration
The first step is to download and install all required modules, libraries etc. I won’t go through enabling modules here, but I will say that the Google PHP API Library must be renamed to google-php-api-client. So your library structure should look like /sites/all/libraries/google-php-api-client/composor.json.
Next, we want to get the p12 key from Google Cloud Storage and the required account and client names. I’ll assume that you’ve created an account and project, and are ready to start configuring. Login to your Google Cloud Storage account and navigate to your project. Then on the left side, open up API’s and Auth, then head over to Credentials. Click Create New Client ID and in the Create New Client ID window, select Service Account and click Create. A new public/private key pair is generated and the JSON format of the key is automatically downloaded. This is not the format we want, however, so we need to click Generate New P12 Key. This is the key we require. Also, take note of the Client ID hash and the Email Address hash, we will need these values later.
Now let's set up a bucket so that we can store our content. Go to the Storage link on the left hand side, open that up, then go to Cloud Storage, then to Storage Browser. Click the Add Bucket button at the top and create a bucket with the required name parameters. In this example, we’ll create one called “oscaddie_media”. Now, click into that bucket and let's create some sub folders. The reason we want to create sub folder is to handle all different environments that will be used in the development of this Drupal site such as dev, staging, production. So, click the New Folder button and create the following: dev, staging, production and local (for local development sites). If you have more environments, feel free to create more folders. Below is a screenshot of the setup.
Load up the Drupal site and login as the administrator. Go to “config/media/oscaddie_gcs” and the configuration page for which the module will be displayed on. This is where we want to configure how the module will connect to the Google Cloud Storage account. So, go ahead and upload the P12 key that we downloaded earlier, as well, enter in the Client ID and Email Address values. Next, enter in the bucket name (oscaddie_media in this example) and the folder for this instance of the site, I’ll assume we're working off of a local environment, so enter in local. Two options below this are Expiry and Library Version. The expiry is the amount of time that a Signed URL will live, so for example, if you enter in 300, that means, the URL will only live for 5 minutes and using that same URL after the fact to access the content will not work. The Library version is so that we know which version of the library version you’re using. By default, it's setup to use 1.05 Beta but if for some reason you want to use the older version, this module also support 0.6.7 (this version is no longer supported by Google, however). Click Save Configuration and the final page should look like this:
Click the Test Connection button to verify if you’ve set everything up correctly. If so, a success message will be displayed at the top of the page. Now that you’ve set the module up correctly, let's tell Drupal to use it.
Go to “admin/config/media/file-system”. You can set Drupal’s default file system to use osCaddie Google Cloud Storage for all file content on the site. That means, any content uploaded via users, image styles generated or CSS/JS files that are created through aggregation by Drupal. Through our experiences so far, we don’t recommend doing this especially in this case where CSS/JS files are created for each user who is logged in and not re-used since they are all authenticated users. As well, a signed URL would be needed for each file being accessed which is not a very performant method in terms of general files such as CSS/JS.
So, it's best to configure each field that you wish to use the module within the content types “managed fields” page. Whether creating or editing an existing field, you want to set the fields “Field Settings” to use this module. That's as simple as selecting the radio button under Upload Destination (image below):
Now your field is configured to use Google Cloud Storage as it's file destination! You can even set image styles or file locations so that the image style is hosted on GCS and that the files are stored in a clean path structure instead of all in the main bucket.
Currently the module is being maintained and developed by use here at Appnovation. The current roadmap for the module includes adding the ability to have not all content be accessed by signed URLs. Meaning, some content can be private, while other content could be public. This would allow for a more versatile site architecture.
That's it. This module can elevate your Drupal site making it an extensible, performant site handling more users and serving there content faster, reliably and securely. If you have any questions or comments, feel free to add them below. Go ahead and give it a try and if your having issues with the module, please open up a ticket on the Drupal.org project page.