High-Level Concepts

ORM Patterns: The Trade-Offs of Active Record and Data Mappers for Object-Relational Mapping

One of the topics of seemingly perennial discussion among programmers is whether object-relational mapping (often abbreviated to ORM) is evil or not. Opinions seem to run the gamut from “I use and love it” to “I tried it once and never will again.” And you often encounter at least a few “what are you talking about?”s.

What is an Object-Relation Mapper (ORM)?

For the “what are you talking about?”s, a little explanation.

  • In an object-oriented system there’s a high probability that you’ll eventually need to safely store your objects so they’ll survive a power outage, memory exhaustion, process shutdown, etc.
  • To do that, you’ll probably end up storing these objects in a relational database (MySQL, Postgres, SQLite, etc) of some kind.
  • And for that, you’ll need a system by which your objects become valid database records.
  • Ideally, that same system should give you back your objects from that same database next time you need them and they aren’t in memory.

That’s what people use object-relational mappers to do.

Among ORMs, there are a two very common philosophies or patterns: Active Record and Data Mapper. These two have advantages and disadvantages, we’ll explore them both in this article. But since it ended up being a “benefits” and “drawback” articles, let’s first explore what’s good about ORMs in general.

Major benefits of using an ORM

  • Less SQL with an ORM — At their best, an ORM can make it possible for you to write a web application without even knowing that SQL exists.
  • Less boilerplate too — SQL is a lot of specific syntax which you repeat a lot in an application. Writing less of any boilerplate (repeated, uninteresting code) is generally great.
  • More domain visibility in your system — One of the biggest downside I see of an SQL-based database interaction is that it often clouds your real use case with this need to know about the database.

Downsides of an Object-Relational Mapping System

  • Less SQL means queries may be more numerous and less performant — On the bad side of less SQL, not writing it can also mean you’re doing very inefficient things without any context for knowing about that inefficiency.
  • Less boilerplate can mean confusing magic (and/or complicated configuration) — This is sort of the same as the above, but when you don’t understand SQL, that it’s an intermediate state which is impacting your application can be a frustratingly deep problem to discover you have.
  • Your domain is not your database — The final critique of ORMs in general is that they make you think about your application in terms of stored entities (what will eventually go into a database) rather than, say, interactions. Interactions are often the more interesting and important part of an application.

Active Record: The Web’s Favorite ORM?

Ruby_on_Rails.svg

If most programmers know about active record, it’s because they’ve used it in a web framework. Ruby on Rails is built around Active Record. So are most PHP frameworks like Laravel, Yii, CodeIgniter, and CakePHP.

The essential concept of the active record pattern is that your database records are “active” in your system. Practically what that means is that if you’re touching five BlogPost objects, and you save them into your database, they’ll end up as five rows in your blog_posts database table. What’s more, if your BlogPost object has postContent and publishedDate properties, that allows us to assume that those are columns in the aforementioned blog_posts table.

Examples of Active Record ORMs

  • Ruby on Rails
  • Laravel’s Eloquent
  • Propel (Symfony)
  • Yii Active Record
  • Django’s ORM

Some Example Code from an Active Record ORM in PHP

These are using a Laravel model. It’s basic structure would look a bit like this:

<?php
// file: app/Models/Post.php

namespace App\Models;

use Illuminate\Database\Eloquent\Model;

class Post extends Model
{
/**
     * The attributes that are mass assignable.
     *
     * @var array
     */
    protected $fillable = [
        'title', 'slug', 'seo_title', 'excerpt', 'body',     'meta_description', 'meta_keywords', 'active', 'image', 'user_id'
    ];
}

You’ll notice above that the object has no clearly defined properties. That’s no a selective omission I’ve made, but rather the most important choice of active record. They don’t recorded on the object because they are in the database. So you end up using something like the array shown above to know that the object (likely) has those properties.

Eloquent relationships are how you define which objects are related to others. Here’s a basic explanatory bit of code:

<?php
namespace App\Models;

class Post extends Model
{
    /**
     * One to Many relation
     * @return \Illuminate\Database\Eloquent\Relations\BelongsTo
     */
    public function user()
    {
        return $this->belongsTo(User::class);
    }

    /**
     * One to Many relation
     * @return \Illuminate\Database\Eloquent\Relations\HasMany
     */
    public function comments()
    {
        return $this->hasMany(Comment::class);
    }
}

The Good: Why People Like Active Record

  • AR is Simple. Because of how tightly matched the records in your database and the objects in your system are conceptually, it’s really easy to pick up a project, examine its database schema, and have a strong sense of what the project is doing. What makes this great is that the ORM layers have a minimal amount of indirection. What you see in the database or objects is likely what exists in the other.
  • Mappings with Active Record are easy to learn and understand. This flows directly out of the simplicity, but you’ll also probably have a pretty intuitive understanding of how you can work with this system even if you’ve never had the least exposure to an ORM before. Its simplicity is merciful and easy.

The Bad: Active Record Has Problems

  • Active Record has lots database coupling (and testing). Because your database is so tightly coupled with your objects, you’ll have a hard time efficiently separating the two. Surely a good active record implementation is likely to make it pretty quick for you to switch from MySQL to Postgres, but it’ll not make it easy to use your objects without the database. The fact that most active record model-based systems are effectively impossible to separate (for testing or other reasons) is often held against them.
  • Performance bottlenecks of Active Record. A very similar complaint about the active record pattern is that you’ll have a hard time dealing with performance bottlenecks when they arise. For small web-apps with a few hundred users this generally isn’t an issue, but the lack of SQL efficiencies that more complex systems of separation between your system objects and your database allow are a big road block as Active Record-based applications grow.

The More Enterprise-y Data Mapper Object-Relational Mapping

doctrine2

The biggest difference between the data mapper pattern and the active record pattern is that the data mapper is meant to be a layer between the actual business domain of your application and the database that persists its data. Where active record seeks to invisibly bridge the gaps between the two as seamlessly as possible, the role of the data mapper is to allow you to consider the two more independently.

Examples of Data Mapper

  • Java Hibernate
  • Doctrine2
  • SQLAlchemy in Python
  • EntityFramework for Microsoft .NET

Data Mapper-y Examples

These are from Doctrine (Symfony) code. Here’s a basic model:

<?php
// file: src/Entity/Post.php

namespace App\Entity;

use Doctrine\Common\Collections\ArrayCollection;
use Doctrine\Common\Collections\Collection;
use Doctrine\ORM\Mapping as ORM;

/**
 * @ORM\Entity(repositoryClass="App\Repository\PostRepository")
 * @ORM\Table(name="symfony_demo_post")
 */
class Post {}

And how you define properties, which unlike Eloquent (active record) you must explicitly define:

class Post {
    /**
     * @var string
     *
     * @ORM\Column(type="string")
     * @Assert\NotBlank
     */
    private $title;

    /**
     * @var string
     *
     * @ORM\Column(type="string")
     */
    private $slug;
}

Because the Symfony/Doctine convention stresses making properties private or protected, it’s also important to make methods of access. Getters and setters are by far the most common method:

class Post {
    public function getTitle(): ?string
    {
        return $this->title;
    }

    public function setTitle(string $title): void
    {
        $this->title = $title;
    }
}

And how you define relationships:

use Doctrine\Common\Collections\ArrayCollection;

class Post {
    /**
     * @var User
     *
     * @ORM\ManyToOne(targetEntity="App\Entity\User")
     * @ORM\JoinColumn(nullable=false)
     */
    private $author;

    /**
     * @var Comment[]|ArrayCollection
     *
     * @ORM\OneToMany(
     *      targetEntity="Comment",
     *      mappedBy="post",
     *      orphanRemoval=true
     * )
     * @ORM\OrderBy({"publishedAt": "DESC"})
     */
    private $comments;
}

I should mention here that the Symfony convention is to use this living code comments (“annotations”) to do this definition, but that Doctrine also supports XML and YAML definitions. The core thing to note is that table columns and names are explicitly defined.

The Good: What Data Mappers Allow

  • Data Mappers allow for greater flexibility between domain and database. As we mentioned above, one of the prototypical reasons that you’ll want to use a data mapper is that you as the application architect do not actually have final say on the database scheme. Where you’ve got a historical database, or a new database with an unfriendly gatekeeper, the data mapper pattern allows you to hide the ways in which you database isn’t an ideal way to think about your domain behind the whole data-mapping layer.
  • Doctrine2 and other data mappers can be much more performant. Similarly, because you do have a layer of abstraction and indirection between your domain objects and your database, there’s a good possibility that you can have the data mapper make more efficient use of the database than a naive active record implementation would allow.

The Bad: Data Mappers Are Hard

  • Often intimidating and hard to set-up. The advantage of active record is that you build your database schema and objects side-by-side, so when you’ve got one you’ve got the other. Because the data mapper pattern is deeper than that, you’re inherently going to have to think a little harder to configure your data mapping layer than you will a practically-invisible active record layer.

Which ORM Should I Choose?

“Which ORM is best?” was the literal question I demanded to have an answer from the world about four years ago. The thing is (as with most everything in both programming and life), that’s a silly question. Context matters. Infrasture matters. The things that seem like great choices in one world aren’t great in others.

In general, I’d perscribe active record for people who want to get up-and-running quick with a from-scratch application. And I’d prescribe data mapper where you wanted an ORM but also had to support compatibility with a strange and “legacy” database schema.

How do you feel about the Single-Responsibility Principle?

I think that the “single-resposibility principle” is often transformed into a parody of a core good idea. But that’s again an opinion too big for this article. But, it’s a topic that comes up a lot around ORMs, and for good reason.

I feel like this article from Jamie Gaskins highlights the case about the SRP and ORMs well, bringing in the whole question of how to test.

The entire reason for my writing that gem has been caused by the ActiveRecord gem’s violation of the Single Responsibility Principle. All objects and methods in a system should follow the Unix principle of “do one thing well”. Active Record (both the pattern and the gem) combine business logic and persistence logic in the same class.


An object based on the Active Record pattern represents not only the singular object but the representation in the database, as well. Additionally, methods on ActiveRecord classes are intended to operate over the entire collection of objects. This is entirely too much functionality and it should be separated.

Context Matters for ORMs, A Lot

As with all things around code, context matters a ton. As I suggested above, one the best things about active record is that your database column names become your object properties. And one of the worst things about active record is that your database column names become your object properties.

What that means, practically, is that if you make your database at the same time you make your database-using web application, you’ll be in good shape using and active record ORM like Laravel Eloquent. Both will be in sync, because you’ll have the same ideas in mind as you did at the outset. This is the happy “green field” scenario.

But data mapper fits much better in a “brown field” or “legacy” database system. Have a DB column labelled rec, which actual is comple encoding of if a student is recommended for the next class? Just change your mapping to encode that meaning. And with Doctrine and other data-mapping systems, doing that is totally normal and obvious. (Depending on the active record implementation you use, renaming is likely to be possible, but may not be easy.)

Some Useful Parting Thoughts on  Object-Relational Mapping

I came to a much greater degree of clarity on this whole thing by watching a conversation Martin Fowler — who basically is the reason these patterns are known by these names — had with his colleague Badri Janakiraman about active record and its role in a “Hexagonal Rails” architecture. You might want to check it out. On that same topic, I also enjoyed an informal conversation organized by Shawn McCool and a number of other PHP developers on the topic.

In general, the question of which ORM system to use (if any) is not one with a clear set of answers. Maybe it makes sense for you to have your database as a part of how you think about your application domain: if it does, there’s not really a good reason to go all the way to a data mapper. That is specifically the case Martin and Badri talk about in their conversation, and both seem to be very satisfied with the result.

The only thing I know for sure: a dogmatic answer to the question of what ORM to use is probably not going to the best one for all possible circumstances. What matters most is that you choose how you work with the database based on what makes the most sense in your specific case, not which general answer is most popular today.

Standard

Leave a Reply

Your email address will not be published. Required fields are marked *