Park your old Drupal site

Case type:

Service area: 
Client: 
Drupal Romania Association

This writeup is heavily based on Karen Stevenson (KarenS) blog post Sending a Drupal Site Into Retirement. The reason for writing this post was to keep a Webikon.com reference, facing the requirement to archive several sites and make them static. True, there are also some small differences.

Why parking a site?

Keeping an old site that have no more interaction as CMS is an expensive task. A good example could be the an event website. The event ended for years and the event site receive no more feedback. It’s a good example of a site that needs parking as a static HTML site.

I’m gonna give a short example from my experience.

HTTrack

I used HTTrack as it was the most flexible option due to its ability to rewrite the destination file paths. I installed HTTrack on an Ubuntu machine:

Prepare

We need to disable everything could invite visitor to interact with the site. This is site specific, you’ll need to review the entire site but here is a minimal check list:

  • Disable Search module.
  • Disable the Contact form.
  • Disable Webform forms.
  • Remove login block and remove links that are pointing to login, register or password recovery pages.
  • Remove exposed filters from Views.
  • Disable Ajax for Views pagers.
  • Remove status messages popup. Search in your theme for this (or similar) line and remove it: <?php print $messages; ?>.
  • Add a banner at the top of the page, saying that the site is archived. You can edit directly the page.tpl.php and add a styled <div> immediately after the <body> tag.

The Static Copy

It’s time now to create the static version. We’ll use HTTrack for this. Bellow is the command used to park the Drupalcamp Arad 2012 event website.

The anatomy of this command line:

  • http://arad2012.drupalcamp.ro — Is the source site to be archived.
  • -O . — Tells where to save the output files. In may case this is the current directory: ~/static.
  • "arad2012.drupalcamp.ro/*" — The scope. Not going beyond the bounds of all the files in the arad2012.drupalcamp.ro domain.
  • -N "%p/%n/index%[page].%t" — This is the magic that Karen Stevenson has added by her blog post. This is very important as we don’t want new URLs to have a pattern like /about.html but we want to keep the Drupal /about pattern instead. This options describes the pattern of new created static files. Basically for each file a directory will be created and inside an index.html file where the page content will be placed. Having the Apache DirectoryIndex set to index.html will resolve the paths correctly. The downside is that even the statical files will be converted by this rule. For example /files/image.png will become /files/image/index.png. The [page] part tells the parser to convert a link like /articles?page=2 into /article/index2.html. This is needed specially for Views pagers:

    • %p — is the original path.
    • %n — the original file name without extension.
    • [page] — the parameter page from the query string.
    • %t — the file extension.
  • For the other parameters consult the HTTrack official guide page: http://www.httrack.com/html/fcguide.html

Fixes

We’re still not ready. The resulted static HTML site still needs some love. Let’s see:

If your Apache server is not configured so, add .htaccess with next content:

Create own root index.html. HTTrack creates its own index.html in web root but we need to emulate the original site. We are copying the already created file:

Edit the this file and fix all links. We are up with one level now, that’s why we are removing the prefix ../ from URLs:

Make the links point to about-us/the-company instead of about-us/the-company/index.html:

Remove links pointing to itself, like <a href="index.html">:

Fix links to home page to point to the top directory instead of index/ subdirectory:

That’s it. Enjoy your new static website. For any case it’s better to keep also a backup of the Drupal site.

Resources