Migrating to Drupal 8

Together with my Webikon.com partner, Gabriel, facing the North.

Together with my Webikon.com partner, Gabriel, facing the North

Case type:

Service area: 

Technologies:

Alkuvoima+East Group, a Finnish strategic digital marketing agency, invited us to join one of their interesting events in Helsinki. We were asked to present some hot topics like: ‟Drupal 8 — What’s Cooking?”, ‟Migrate to Drupal 8” and ‟Drupal 8 as a mobile backend” along with another cutting-edge presentation: ‟Drupal and Apache Stanbol”. The big interest aroused by the ‟Migrate to Drupal 8” topic inspired me to elaborate on this subject. Here we go.

Table of Contents

  1. Table of Contents
  2. Disclaimer
  3. A Bit of History
    1. Update vs. Upgrade
    2. Migrate (contrib module)
    3. Migrate in Core
  4. Basics
    1. The (Good) Migration System
    2. The Flow
    3. Components
  5. The Anatomy of a Migration
    1. The Module
    2. Migrations
    3. Roles
    4. Users
    5. Content
  6. Running Migrations
  7. Credits
  8. Author
  9. Change Log

Disclaimer

At the time when writing this article the Migrate System is under heavy development. Some of the features or API may change in the near future. The last unstable version of the system is hosted in a sandbox project at. Not all the current work is pushed yet to Drupal 8 core.

As new features and fixes will be added, I will update the article. Check the Change Log to see what's new.

A Bit of History

Update vs. Upgrade

Update is the process of updating Drupal from one minor version to another minor version, for example Drupal 7.23 to 7.24. The upgrade process occurs when Drupal moves from a major version to another major version (Drupal 7.x to 8.x). Until the 8.x release, Drupal has used the upgrade path in order to perform major version upgrades. This is documented at drupal.org/upgrade. The process has proved to be very difficult. On complex sites, the upgrade was almost impossible.

Migrate (contrib module)

The Migrate module is in service since 2009. It was not built necessarily for Drupal upgrades but due to its flexibility it can be used for this purpose, too. Basically any kind of data can be migrated into Drupal 6 or 7 using this great module.

Migrate in Core

At Drupalcon Prague 2013 Dries announced the switch to a migrate-based upgrade by moving the Migrate module into Drupal core. Károly Négyesi (chx) was named core maintainer for the Migrate module. The development started in a sandbox project. A big portion of the code and the interaction between the elements is brand new. However, there’s still a significant part of it coming straight from D7: highwater marks, track changes, id map.

Basics

Before jumping into details, let’s see what a migration system really means. In this section I will try to familiarize you with some migration concepts just to make sure we’re on the same page.

The (Good) Migration System

What makes a migration system a ‟good” migration system? In my opinion a good migration system should meet, at least, the following requirements:

  1. Import: This is obvious. The system should have an import mechanism, able to import data from source to destination.
  2. Rollback: The migration system must have the capability to rollback a previous migration and leave the environment clean, in the same state as it was before importing.
  3. Incremental: The system should be able to run a new migration in top of a previous migration. For example, the source database might receive new content after a first migration. Running again, the migration should import only the new data. This usually occurs when migrating a live legacy site. After designing the migration and importing the most part of content, the source site may still receive updates (new articles, new users, new comments). A new migration should incrementally import only the fresh content.
  4. Map: A good system is recording the migration of each record. Otherwise it’s almost impossible to provide rollback and incrementing capabilities. The most important data recorded for each migrated items is the pair of primary IDs. The map record is basically establishing a relation between the source and the destination primary IDs. Note that a primary ID can be defined by more than one field.
  5. Chicken 'n Egg handling: Suppose that in a record you are referring another record that hasn’t been yet imported. Oops! Because that reference was not migrated yet, we cannot predict its destination primary ID. Not having that ID sucks because we cannot save the current record. The system must know to handle such situations and, yes, this can be handled by a good migration system.
  6. Robust: The system must be designed in a way that allows huge quantity of data to be imported or rolled back. In Drupal migration the Drush utility is a great tool to perform migrations because of its ability to manage memory and to split large tasks among many batches to prevent resource exhaustion.
  7. Flexible: It should allow implementation of plugins to handle any kind of sources (Drupal, MySQL, SQLite, MSSQL, XML, feeds, plain files like .txt, .csv, .xml, etc.) or destinations.

Other ‟nice to have” features for a stunning migration system:

  1. Reimport changed records: In Drupal this is referred also as hightwater mark. By having an incremental migration system (see 3), each time a migration runs, any previously un-imported source items are imported. Source data may contain a last changed timestamp field. On each migration's run, the highest value of that fields is recorded. On subsequent runs, each already-imported record will compare its last changed value with the one stored in the previous migration. If it finds that the value is greater, that means ‟this record has changed” and the system will reimport the record and save the timestamp as the new highwater mark.
  2. System of Record: A system of record allows a migration to update only specific fields in objects already migrated by other migrations.

The Flow

Import flow:

  1. Extract: The data is extracted from the source.
  2. Process: Data is transformed (if needed) to meet the destination's structure/format.
  3. Save: The data is stored/saved in the destination.
  4. Register: The pair of source and destination primary IDs is registered/saved in the map table.

Rollback flow:

  1. Remove: The record is deleted from the destination.
  2. Deregister: The corresponding mapping record is removed.

Components

Let’s consider a simple example: A table, people, is the source for populating the Drupal users table. I will explain a little bit of each component of a migration.

Source & Destination

The source component is a plugin called source plugin. This is from where data comes from. The source of data can be a database, plain files, another CMS or even a legacy Drupal website. You can imagine any source you like as the system is flexible enough to handle that. The single constraint is that the source plugin must be able to perform next functionality:

  • Access the source data (SQL, CSV, XML, JSON, etc.)
  • Describe the source primary ID key.
  • Describe the source fields that will be migrated.
  • Iterate over rows of source data to provide a current row.

The destination component is called destination plugin. As the destination is always a Drupal site, a destination is a plugin able to save data to an entity like nodes, users, taxonomy terms, files or even to Drupal configuration like roles, field types, or system variables. The destination plugin will typically provide the following functionality:

  • Describe the fields of destination, where the data is saved.
  • Describe the destination primary ID key.
  • Know how to save data based of the nature of destination (node, taxonomy term, user etc).
  • Save a new destination row for each source row.

In our example, the source data is stored in the same MySQL database as the destination Drupal site. The first source table is the migrate_example_people and the destination is a Drupal 8 user entity. We’ll concatenate first_name and last_name into the user name. John Doe will became john.doe.

Mapping & Process

Each field mapping component is in fact a list of process plugins (processors). The simplest and most basic process plugin (get) will just take a source field and map it to a destination field. In our case email and pass are simply mapped, each one, to their destination fields mail and pass. First and last name are passed both to the p0 process plugin which concatenates them using a dot as delimiter and then the result is passed to p1 process plugin. This plugin converts the name to lowercase and sends it to destination plugin to be saved as the user name. The groups string field is passed to p0 process plugin that explodes to an array, based on semicolon.

Basically the mapping is a chain of process plugins, each one doing, very well, a small piece of transformation against the source value.

The ID Map

The id map keeps a relationship between the primary ID of a source record and the primary ID of the destination record. The ID map is called the id map plugin and brings the following functionality to the migration system:

  • Permits lookups of the destination ID based on source ID and vice-versa.
  • Allows the rollback action. On rollback, we need to delete only migrated records and not those that were already present.
  • Allows records migrated in a previous migration to be updated.

The Migration (all together)

There must be a component defining the whole migration, telling what plugin will be used for the source and destination and providing the list of fields being migrated together with their process plugins. In Drupal 8 migrations are configurables that are stored in configuration YAML files.

Small peek into configurables

So, what are configurables? “Configurables” are configuration entities. In Drupal 8 the content is separated from configuration. Both are classes and share the same ancestor: the Entity class.

  • A configurable is the way Drupal 8 stores the configuration of specific functionality. E.g. the definition of a node type is stored in a configuration entity of type node_type.
  • Configuration entity types are annotated classes, meaning that the object meta information is stored in annotation rather than in info hooks - as it was in Drupal <= 7.
  • Imagine configurables as entities storing their data in config YAML files rather than DB.
  • The “fields” of a configurable are the public properties exposed by the configurable object.
/**
 * Defines the Migration entity.
 *
 * @EntityType(
 *   id = "migration",
 *   label = @Translation("Migration"),
 *   module = "migrate",
 *   controllers = {
 *     "storage" = "Drupal\migrate\MigrationStorageController"
 *   },
 *   config_prefix = "migrate.migration",
 *   entity_keys = {
 *     "id" = "id",
 *     "label" = "label",
 *     "weight" = "weight",
 *     "uuid" = "uuid"
 *   }
 * )
 */
class Migration extends ConfigEntityBase implements MigrationInterface {
  ...
}

So defining a migration is simply creating a configuration YAML file and dropping it into config/ directory under your module.

Core Migrate Modules

Migrate (core/modules/migrate/)

  • provides a general API for all migrations
  • provides interfaces and base classes for all migration plugin components (source, destination, process, id_map, row).
  • provides a plugin manager for the manipulation of migration plugins.
  • provides the migrate configurable (configuration entity type).

Migrate Drupal (core/modules/migrate_drupal/)

  • the first module using the new Migrate API.
  • kind of a migrate_d2d successor.
  • migrates out-of-the-box from Drupal 6 and 7 into Drupal 8.
  • Defines migrations for all system components:
    • Drupal 6 settings (site name, slogan, roles, etc.)
    • Content definitions (vocabularies, node types, etc.)
    • Content (nodes, terms, users, etc.)

The Anatomy of a Migration

I will explain a real migration structure by analysing the structure of the Migrate Example (migrate_example) module, part of Examples for Developers suite. Examples for Developers is a project aiming to provide high-quality, well-documented API examples for a broad range of Drupal core functionality. Let’s jump in.

The Module

Here’s the ZIP archive with the Migrate Example (migrate_example) module. Download and unzip the file. You’ll need it to understand the next part of the presentation.

Migrations

We’ll migrate three kinds of objects from a hypothetical proprietary CMS into Drupal 8:

  1. user roles
  2. users
  3. articles

In order to keep things simple, the source tables are located in the same database as the Drupal installation. Here’s the source data that will be automatically generated when the migrate_example module is enabled:

It’s easy to observe that roles and users are using a single table source, migrate_example_people, while nodes will have migrate_example_content as the source.

Roles

Each migration starts by creating the migration YAML configuration file into the config/ directory, under the module root.

config/migrate.migration.migrate_example_user_roles.yml

id: migrate_example_user_role
source:
  plugin: migrate_example_user_role
destination:
  plugin: entity:user_role
process:
  id:
    -
      plugin: machine_name
      source: group
    -
      plugin: dedupe_entity
      entity_type: user_role
      field: id
  label: group

There are 4 keys that need to be filled:

  • id: The unique ID of the migration. It must be exactly the same as the last part of filename (migrate_example_user_role).
  • source: Defines the source. Usually it has a single plugin: key indicating the source plugin. In this case, the source plugin is a custom plugin that needs implementation. See later.
  • destination: Indicates what plugin is to be used as destination plugin, along with plugin-specific configurations. In this case we are using entity:user_role as the plugin. This means that the destination is an entity of type user role. This plugin is already in Migrate module. There’s nothing you need to do.
  • process: This is the most interesting part. It describes the list of processors to be applied on each destination field. Each destination contain one or more process plugins (processors). There are still destination fields that doesn’t need to be transformed in any way. Here, the label destination field will simply receive the value of source field group, as it is. But the destination id field will be computed as follows:
    • The source group field is passed to the machine_name plugin. This process plugin transforms the input to a valid machine name string. See manual page: Process plugin: machine_name.
    • The resulting value is passed to the dedupe_entity plugin. This plugin assures that the value will not overlap any existing value from the field field in the entity entity_type.

Now let’s get back to the source configuration. Remember that we need to define a new plugin: migrate_example_user_role. This is accomplished by extending the abstract SourcePluginBase into a new plugin \Drupal\migrate_example\Plugin\migrate\source\Role. Check the file lib/Drupal/migrate_example/Plugin/migrate/source/Role.php from the attached module package for the full content of this class. There are few methods that need attention:

\Drupal\migrate_example\Plugin\migrate\source\Role

class Role extends SourcePluginBase {
  // ...
  public function getIds() {
    return array('group' => array('type' => 'string'));
  }
  // ...
}

This method provides the source primary ID, in this case the group field and specifies its type using TypedData API. Together with the destination primary ID, this field tells migration how to build the id map, keeping the relationship between source and destination records. Important! The primary ID is not always composed from only one source key. There are cases when defining the primary unique key for a record requires two or more fields. You’ll have to add each field from the primary ID in this array.

\Drupal\migrate_example\Plugin\migrate\source\Role

class Role extends SourcePluginBase {
  // ...
  public function fields() {
    return array(
      'group' => $this->t('Group'),
    );
  }
  // ...
}

Here we simply return a list of source fields keyed by field ID with the translated description as value. The field ID must be exactly the same as the keys of each row (see below).

\Drupal\migrate_example\Plugin\migrate\source\Role

class Role extends SourcePluginBase {
  // ...
  public function getIterator() {
    if (!isset($this->iterator)) {
      $people = $this->query()->execute()->fetchCol();
 
      $items = array();
      foreach ($people as $groups) {
        $groups = explode(';', $groups);
        foreach ($groups as $group) {
          if (!isset($items[$group])) {
            $items[$group] = array(
              'group' => $group,
            );
          }
        }
      }
      $this->iterator = new \IteratorIterator(new \ArrayIterator($items));
    }
 
    return $this->iterator;
  }
  // ...
}

This method provides the iterator. This is the object that iterates over rows of source data to provide a current row.

Check the file containing this class to get a full overview of how such a class must look. And yes, that’s it with role migration :)

Users

config/migrate.migration.migrate_example_people.yml

id: migrate_example_people
source:
  plugin: migrate_example_people
destination:
  plugin: entity:user
  md5_passwords: true
process:
  name:
    -
      plugin: concat
      delimiter: .
      source:
        - first_name
        - last_name
    -
      plugin: callback
      callable:
        - '\Drupal\Component\Utility\Unicode'
        - strtolower
    -
      plugin: callback
      callable: trim
    -
      plugin: dedupe_entity
      entity_type: user
      field: name
  mail: email
  pass: pass
  roles:
    -
      plugin: explode
      delimiter: ';'
      source: groups

Let’s see how this migration is defined:

  • source: In the same way as when we migrated roles, this is a custom plugin that needs implementation. See lib/Drupal/migrate_example/Plugin/migrate/source/People.php file for the class description.
  • destination: We are using the specialized entity:user destination plugin already provided by Migrate module. This plugin lets migration receive MD5 encrypted passwords and converts them into salted re-hashed passwords used in Drupal >= 7. We are telling the system to do this by configuring md5_passwords: true.
  • process: While destinations mail and pass take the source input and the uid is auto-generated (this is the reason why it is missing from the process list), the last one needs some processing:
    • Using "." as delimiter, we are concatenating the source first_name and last_name by passing them to the concat process plugin. Then, using the callback process plugin along with Drupal Unicode::strtolower() component, we are transforming the new name into a lowercase username and sending. User name cannot have leading or trailing spaces, that's why we are stripping spaces out by using again the callback processor with PHP trim() function. Finally we are assuring unique user names by running the dedupe_entity processor before sending the value to its destination.
    • Source groups are coming as ";" delimited strings but we need to pass roles as an array. Because we don’t have a process plugin for this, we need to write a custom plugin — and this is the explode process plugin. See below.

\Drupal\migrate_example\Plugin\migrate\process\Explode

/**
 * @MigrateProcessPlugin(
 *   id = "explode",
 * )
 */
class Explode extends ProcessPluginBase {
  public function transform($value, MigrateExecutable $migrate_executable, Row $row, $destination_property) {
    return explode($this->configuration['delimiter'], $value);
  }
}

Normally the groups field should be converted to an array at the source plugin level, in the prepareRow() method. We are creating this simple process plugin just to prove how simply a processor can be built. Isn't it simple to implement a migration process plugin?

Content

config/migrate.migration.migrate_example_content.yml

id: migrate_example_content
source:
  plugin: migrate_example_content
destination:
  plugin: entity:node
  type: page
process:
  title: subject
  'body:value': text
  uid:
    -
      plugin: migration
      migration: migrate_example_people
      source:
        - author
  created:
    -
      plugin: callback
      callable: strtotime
      source: date

A quick guide to each key:

  • source: This needs a source plugin implementation. See lib/Drupal/migrate_example/Plugin/migrate/source/Content.php file for the class description.
  • destination: We are using the standard entity:node destination plugin and we are also providing the bundle by using the entity-specific bundle key (type in this case).
  • process: The node ID (nid) is auto-generated. The destination fields title and body are receiving unaltered values from source. Note that body needs to be described also with its column 'body:value'. If you want to store something in the body summary you should use 'body:summary'. And the processed fields:
    • For the node’s author (uid) we need to take the source author fields and pass them to the migration process plugin. This plugin will lookup in the id map table of the migrate_example_people migration and will get the destination id for the source author. This translated value will be stored in the Drupal backend.
    • Because the source date field is in 'Y-m-d H:i:s' format, we need to convert it in a Unix timestamp. We’ll simply pass it to the callback process plugin that will use the PHP function strtotime() for processing.

Running Migrations

The recommended method for running a migration is the Drush command line tool. Drush has the ability to manage the memory and to split large tasks among many batches, preventing resource exhaustion.

However the Drupal core will be provide a basic user interface making possible to run migrations using the Batch API, but this method is not recommended.

That’s all with Migration to Drupal 8. Good luck!

Credits

  • Károly Négyesi (chx) and Mike Ryan (mikeryan): The core migrate modules maintainers. Károly Négyesi was so kind to answer my questions on IRC, so that I was able to complete the work and provide the presentation. Thank you chx!
  • Joe Shindelar training presentation on Drupalize.me. That video presentation, even it's about Drupal 7 migration, helped me to structure the text.
  • Many thanks to Melissa Anderson (eliza411)! She was so kind to help me with a proof-reading review of the entire blog post.

Author

Claudiu Cristea — claudiu.cristea, @claudiu_cristea

Change Log

February 4, 2014
  1. Table of Contents added.
  2. Disclaimer added.
January 30, 2014

Changes in Migrate Example (migrate_example) module. See interdiff.txt:

  1. entity_user is now entity:user in destination: of .yml files. See 8ba4805.
  2. Added 2 new process plugins to migrate_example_people migrations.
January 29, 2014
Initial release.