Migrating to Drupal 8
Case type:
Technologies:
Alkuvoima+East Group, a Finnish strategic digital marketing agency, invited us to join one of their interesting events in Helsinki. We were asked to present some hot topics like: ‟Drupal 8 — What’s Cooking?”, ‟Migrate to Drupal 8” and ‟Drupal 8 as a mobile backend” along with another cutting-edge presentation: ‟Drupal and Apache Stanbol”. The big interest aroused by the ‟Migrate to Drupal 8” topic inspired me to elaborate on this subject. Here we go.
Table of Contents
- Table of Contents
- Disclaimer
- A Bit of History
- Basics
- The Anatomy of a Migration
- Running Migrations
- Credits
- Author
- Change Log
Disclaimer
At the time when writing this article the Migrate System is under heavy development. Some of the features or API may change in the near future. The last unstable version of the system is hosted in a sandbox project at. Not all the current work is pushed yet to Drupal 8 core.
As new features and fixes will be added, I will update the article. Check the Change Log to see what's new.
A Bit of History
Update vs. Upgrade
Update is the process of updating Drupal from one minor version to another minor version, for example Drupal 7.23 to 7.24. The upgrade process occurs when Drupal moves from a major version to another major version (Drupal 7.x to 8.x). Until the 8.x release, Drupal has used the upgrade path in order to perform major version upgrades. This is documented at drupal.org/upgrade. The process has proved to be very difficult. On complex sites, the upgrade was almost impossible.
Migrate (contrib module)
The Migrate module is in service since 2009. It was not built necessarily for Drupal upgrades but due to its flexibility it can be used for this purpose, too. Basically any kind of data can be migrated into Drupal 6 or 7 using this great module.
Migrate in Core
At Drupalcon Prague 2013 Dries announced the switch to a migrate-based upgrade by moving the Migrate module into Drupal core. Károly Négyesi (chx) was named core maintainer for the Migrate module. The development started in a sandbox project. A big portion of the code and the interaction between the elements is brand new. However, there’s still a significant part of it coming straight from D7: highwater marks, track changes, id map.
Basics
Before jumping into details, let’s see what a migration system really means. In this section I will try to familiarize you with some migration concepts just to make sure we’re on the same page.
The (Good) Migration System
What makes a migration system a ‟good” migration system? In my opinion a good migration system should meet, at least, the following requirements:
- Import: This is obvious. The system should have an import mechanism, able to import data from source to destination.
- Rollback: The migration system must have the capability to rollback a previous migration and leave the environment clean, in the same state as it was before importing.
- Incremental: The system should be able to run a new migration in top of a previous migration. For example, the source database might receive new content after a first migration. Running again, the migration should import only the new data. This usually occurs when migrating a live legacy site. After designing the migration and importing the most part of content, the source site may still receive updates (new articles, new users, new comments). A new migration should incrementally import only the fresh content.
- Map: A good system is recording the migration of each record. Otherwise it’s almost impossible to provide rollback and incrementing capabilities. The most important data recorded for each migrated items is the pair of primary IDs. The map record is basically establishing a relation between the source and the destination primary IDs. Note that a primary ID can be defined by more than one field.
- Chicken 'n Egg handling: Suppose that in a record you are referring another record that hasn’t been yet imported. Oops! Because that reference was not migrated yet, we cannot predict its destination primary ID. Not having that ID sucks because we cannot save the current record. The system must know to handle such situations and, yes, this can be handled by a good migration system.
- Robust: The system must be designed in a way that allows huge quantity of data to be imported or rolled back. In Drupal migration the Drush utility is a great tool to perform migrations because of its ability to manage memory and to split large tasks among many batches to prevent resource exhaustion.
- Flexible: It should allow implementation of plugins to handle any kind of sources (Drupal, MySQL, SQLite, MSSQL, XML, feeds, plain files like .txt, .csv, .xml, etc.) or destinations.
Other ‟nice to have” features for a stunning migration system:
- Reimport changed records: In Drupal this is referred also as hightwater mark. By having an incremental migration system (see 3), each time a migration runs, any previously un-imported source items are imported. Source data may contain a last changed timestamp field. On each migration's run, the highest value of that fields is recorded. On subsequent runs, each already-imported record will compare its last changed value with the one stored in the previous migration. If it finds that the value is greater, that means ‟this record has changed” and the system will reimport the record and save the timestamp as the new highwater mark.
- System of Record: A system of record allows a migration to update only specific fields in objects already migrated by other migrations.
The Flow
Import flow:
- Extract: The data is extracted from the source.
- Process: Data is transformed (if needed) to meet the destination's structure/format.
- Save: The data is stored/saved in the destination.
- Register: The pair of source and destination primary IDs is registered/saved in the map table.
Rollback flow:
- Remove: The record is deleted from the destination.
- Deregister: The corresponding mapping record is removed.
Components
Let’s consider a simple example: A table, people, is the source for populating the Drupal users table. I will explain a little bit of each component of a migration.
Source & Destination
The source component is a plugin called source plugin. This is from where data comes from. The source of data can be a database, plain files, another CMS or even a legacy Drupal website. You can imagine any source you like as the system is flexible enough to handle that. The single constraint is that the source plugin must be able to perform next functionality:
- Access the source data (SQL, CSV, XML, JSON, etc.)
- Describe the source primary ID key.
- Describe the source fields that will be migrated.
- Iterate over rows of source data to provide a current row.
The destination component is called destination plugin. As the destination is always a Drupal site, a destination is a plugin able to save data to an entity like nodes, users, taxonomy terms, files or even to Drupal configuration like roles, field types, or system variables. The destination plugin will typically provide the following functionality:
- Describe the fields of destination, where the data is saved.
- Describe the destination primary ID key.
- Know how to save data based of the nature of destination (node, taxonomy term, user etc).
- Save a new destination row for each source row.
In our example, the source data is stored in the same MySQL database as the destination Drupal site. The first source table is the migrate_example_people
and the destination is a Drupal 8 user entity. We’ll concatenate first_name
and last_name
into the user name. John Doe
will became john.doe
.
Mapping & Process
Each field mapping component is in fact a list of process plugins (processors). The simplest and most basic process plugin (get) will just take a source field and map it to a destination field. In our case email
and pass
are simply mapped, each one, to their destination fields mail
and pass
. First and last name are passed both to the p0
process plugin which concatenates them using a dot as delimiter and then the result is passed to p1
process plugin. This plugin converts the name to lowercase and sends it to destination plugin to be saved as the user name. The groups string field is passed to p0
process plugin that explodes to an array, based on semicolon.
Basically the mapping is a chain of process plugins, each one doing, very well, a small piece of transformation against the source value.
The ID Map
The id map keeps a relationship between the primary ID of a source record and the primary ID of the destination record. The ID map is called the id map plugin and brings the following functionality to the migration system:
- Permits lookups of the destination ID based on source ID and vice-versa.
- Allows the rollback action. On rollback, we need to delete only migrated records and not those that were already present.
- Allows records migrated in a previous migration to be updated.
The Migration (all together)
There must be a component defining the whole migration, telling what plugin will be used for the source and destination and providing the list of fields being migrated together with their process plugins. In Drupal 8 migrations are configurables that are stored in configuration YAML files.
Small peek into configurables
So, what are configurables? “Configurables” are configuration entities. In Drupal 8 the content is separated from configuration. Both are classes and share the same ancestor: the Entity
class.
- A configurable is the way Drupal 8 stores the configuration of specific functionality. E.g. the definition of a node type is stored in a configuration entity of type
node_type
. - Configuration entity types are annotated classes, meaning that the object meta information is stored in annotation rather than in info hooks - as it was in Drupal <= 7.
- Imagine configurables as entities storing their data in config YAML files rather than DB.
- The “fields” of a configurable are the public properties exposed by the configurable object.
/** * Defines the Migration entity. * * @EntityType( * id = "migration", * label = @Translation("Migration"), * module = "migrate", * controllers = { * "storage" = "Drupal\migrate\MigrationStorageController" * }, * config_prefix = "migrate.migration", * entity_keys = { * "id" = "id", * "label" = "label", * "weight" = "weight", * "uuid" = "uuid" * } * ) */ class Migration extends ConfigEntityBase implements MigrationInterface { ... }
So defining a migration is simply creating a configuration YAML file and dropping it into config/
directory under your module.
Core Migrate Modules
Migrate (core/modules/migrate/
)
- provides a general API for all migrations
- provides interfaces and base classes for all migration plugin components (source, destination, process, id_map, row).
- provides a plugin manager for the manipulation of migration plugins.
- provides the migrate configurable (configuration entity type).
Migrate Drupal (core/modules/migrate_drupal/
)
- the first module using the new Migrate API.
- kind of a migrate_d2d successor.
- migrates out-of-the-box from Drupal 6 and 7 into Drupal 8.
- Defines migrations for all system components:
- Drupal 6 settings (site name, slogan, roles, etc.)
- Content definitions (vocabularies, node types, etc.)
- Content (nodes, terms, users, etc.)
The Anatomy of a Migration
I will explain a real migration structure by analysing the structure of the Migrate Example (migrate_example
) module, part of Examples for Developers suite. Examples for Developers is a project aiming to provide high-quality, well-documented API examples for a broad range of Drupal core functionality. Let’s jump in.
The Module
Here’s the ZIP archive with the Migrate Example (migrate_example
) module. Download and unzip the file. You’ll need it to understand the next part of the presentation.
Migrations
We’ll migrate three kinds of objects from a hypothetical proprietary CMS into Drupal 8:
- user roles
- users
- articles
In order to keep things simple, the source tables are located in the same database as the Drupal installation. Here’s the source data that will be automatically generated when the migrate_example
module is enabled:
It’s easy to observe that roles and users are using a single table source, migrate_example_people
, while nodes will have migrate_example_content
as the source.
Roles
Each migration starts by creating the migration YAML configuration file into the config/
directory, under the module root.
config/migrate.migration.migrate_example_user_roles.yml
id: migrate_example_user_role source: plugin: migrate_example_user_role destination: plugin: entity:user_role process: id: - plugin: machine_name source: group - plugin: dedupe_entity entity_type: user_role field: id label: group
There are 4 keys that need to be filled:
id
: The unique ID of the migration. It must be exactly the same as the last part of filename (migrate_example_user_role
).source
: Defines the source. Usually it has a singleplugin:
key indicating the source plugin. In this case, the source plugin is a custom plugin that needs implementation. See later.destination
: Indicates what plugin is to be used as destination plugin, along with plugin-specific configurations. In this case we are usingentity:user_role
as the plugin. This means that the destination is an entity of type user role. This plugin is already in Migrate module. There’s nothing you need to do.process
: This is the most interesting part. It describes the list of processors to be applied on each destination field. Each destination contain one or more process plugins (processors). There are still destination fields that doesn’t need to be transformed in any way. Here, thelabel
destination field will simply receive the value of source fieldgroup
, as it is. But the destinationid
field will be computed as follows:- The source
group
field is passed to themachine_name
plugin. This process plugin transforms the input to a valid machine name string. See manual page: Process plugin: machine_name. - The resulting value is passed to the
dedupe_entity
plugin. This plugin assures that the value will not overlap any existing value from the fieldfield
in the entityentity_type
.
- The source
Now let’s get back to the source
configuration. Remember that we need to define a new plugin: migrate_example_user_role
. This is accomplished by extending the abstract SourcePluginBase
into a new plugin \Drupal\migrate_example\Plugin\migrate\source\Role
. Check the file lib/Drupal/migrate_example/Plugin/migrate/source/Role.php
from the attached module package for the full content of this class. There are few methods that need attention:
\Drupal\migrate_example\Plugin\migrate\source\Role
class Role extends SourcePluginBase { // ... public function getIds() { return array('group' => array('type' => 'string')); } // ... }
This method provides the source primary ID, in this case the group
field and specifies its type using TypedData API. Together with the destination primary ID, this field tells migration how to build the id map, keeping the relationship between source and destination records. Important! The primary ID is not always composed from only one source key. There are cases when defining the primary unique key for a record requires two or more fields. You’ll have to add each field from the primary ID in this array.
\Drupal\migrate_example\Plugin\migrate\source\Role
class Role extends SourcePluginBase { // ... public function fields() { return array( 'group' => $this->t('Group'), ); } // ... }
Here we simply return a list of source fields keyed by field ID with the translated description as value. The field ID must be exactly the same as the keys of each row (see below).
\Drupal\migrate_example\Plugin\migrate\source\Role
class Role extends SourcePluginBase { // ... public function getIterator() { if (!isset($this->iterator)) { $people = $this->query()->execute()->fetchCol(); $items = array(); foreach ($people as $groups) { $groups = explode(';', $groups); foreach ($groups as $group) { if (!isset($items[$group])) { $items[$group] = array( 'group' => $group, ); } } } $this->iterator = new \IteratorIterator(new \ArrayIterator($items)); } return $this->iterator; } // ... }
This method provides the iterator. This is the object that iterates over rows of source data to provide a current row.
Check the file containing this class to get a full overview of how such a class must look. And yes, that’s it with role migration :)
Users
config/migrate.migration.migrate_example_people.yml
id: migrate_example_people source: plugin: migrate_example_people destination: plugin: entity:user md5_passwords: true process: name: - plugin: concat delimiter: . source: - first_name - last_name - plugin: callback callable: - '\Drupal\Component\Utility\Unicode' - strtolower - plugin: callback callable: trim - plugin: dedupe_entity entity_type: user field: name mail: email pass: pass roles: - plugin: explode delimiter: ';' source: groups
Let’s see how this migration is defined:
source
: In the same way as when we migrated roles, this is a custom plugin that needs implementation. Seelib/Drupal/migrate_example/Plugin/migrate/source/People.php
file for the class description.destination
: We are using the specializedentity:user
destination plugin already provided by Migrate module. This plugin lets migration receive MD5 encrypted passwords and converts them into salted re-hashed passwords used in Drupal >= 7. We are telling the system to do this by configuringmd5_passwords: true
.process
: While destinationsmail
andpass
take the source input and theuid
is auto-generated (this is the reason why it is missing from the process list), the last one needs some processing:- Using "." as delimiter, we are concatenating the source
first_name
andlast_name
by passing them to theconcat
process plugin. Then, using thecallback
process plugin along with DrupalUnicode::strtolower()
component, we are transforming the new name into a lowercase username and sending. User name cannot have leading or trailing spaces, that's why we are stripping spaces out by using again thecallback
processor with PHPtrim()
function. Finally we are assuring unique user names by running thededupe_entity
processor before sending the value to its destination. - Source groups are coming as ";" delimited strings but we need to pass roles as an array. Because we don’t have a process plugin for this, we need to write a custom plugin — and this is the
explode
process plugin. See below.
- Using "." as delimiter, we are concatenating the source
\Drupal\migrate_example\Plugin\migrate\process\Explode
/** * @MigrateProcessPlugin( * id = "explode", * ) */ class Explode extends ProcessPluginBase { public function transform($value, MigrateExecutable $migrate_executable, Row $row, $destination_property) { return explode($this->configuration['delimiter'], $value); } }
Normally the groups
field should be converted to an array at the source plugin level, in the prepareRow()
method. We are creating this simple process plugin just to prove how simply a processor can be built. Isn't it simple to implement a migration process plugin?
Content
config/migrate.migration.migrate_example_content.yml
id: migrate_example_content source: plugin: migrate_example_content destination: plugin: entity:node type: page process: title: subject 'body:value': text uid: - plugin: migration migration: migrate_example_people source: - author created: - plugin: callback callable: strtotime source: date
A quick guide to each key:
source
: This needs a source plugin implementation. Seelib/Drupal/migrate_example/Plugin/migrate/source/Content.php
file for the class description.destination
: We are using the standardentity:node
destination plugin and we are also providing the bundle by using the entity-specific bundle key (type
in this case).process
: The node ID (nid
) is auto-generated. The destination fieldstitle
andbody
are receiving unaltered values from source. Note thatbody
needs to be described also with its column'body:value'
. If you want to store something in the body summary you should use'body:summary'
. And the processed fields:- For the node’s author (
uid
) we need to take the source author fields and pass them to the migration process plugin. This plugin will lookup in the id map table of themigrate_example_people
migration and will get the destination id for the source author. This translated value will be stored in the Drupal backend. - Because the source
date
field is in'Y-m-d H:i:s'
format, we need to convert it in a Unix timestamp. We’ll simply pass it to thecallback
process plugin that will use the PHP functionstrtotime()
for processing.
- For the node’s author (
Running Migrations
The recommended method for running a migration is the Drush command line tool. Drush has the ability to manage the memory and to split large tasks among many batches, preventing resource exhaustion.
However the Drupal core will be provide a basic user interface making possible to run migrations using the Batch API, but this method is not recommended.
That’s all with Migration to Drupal 8. Good luck!
Credits
- Károly Négyesi (chx) and Mike Ryan (mikeryan): The core migrate modules maintainers. Károly Négyesi was so kind to answer my questions on IRC, so that I was able to complete the work and provide the presentation. Thank you chx!
- Joe Shindelar training presentation on Drupalize.me. That video presentation, even it's about Drupal 7 migration, helped me to structure the text.
- Many thanks to Melissa Anderson (eliza411)! She was so kind to help me with a proof-reading review of the entire blog post.
Author
Claudiu Cristea — claudiu.cristea, @claudiu_cristea
Change Log
- February 4, 2014
-
- Table of Contents added.
- Disclaimer added.
- January 30, 2014
-
Changes in Migrate Example (
migrate_example
) module. See interdiff.txt:entity_user
is nowentity:user
indestination:
of .yml files. See 8ba4805.- Added 2 new process plugins to
migrate_example_people
migrations.
- January 29, 2014
- Initial release.