Magento 2

Magento fundamentals: what are the differences between collections and repositories?

5 minutes reading

One of the most frequent question when manipulating entities in Magento 2 is: “should I use a repository or a collection?”

In this article, we’ll see which are the main differences between the two.

We won’t go deep into the syntax details but rather try to understand the pros and cons of both and when it’s better to use one or another.

A brief definition of repositories and collections

Repositories were introduced in Magento 2, while collections are a legacy of Magento 1.

We use both to deal with entity persistence but from different abstraction layers.

Repositories are a higher abstraction that allows complete decoupling from the persistence layer.

According to Martin Fowler’s definition, “a repository mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.”

Fowler doesn’t mention the domain by chance; repositories were first introduced as a part of Domain-Driven Design in 2004.

On the other hand, collections are a lower abstraction where a query is constructed, dealing more with database-related concepts and belonging to the resource models layer.

Being a higher abstraction, in Magento 2, repositories are often resource model wrappers and exposed as API resources.

The following is a simplified representation of the relations between layers:

Two kinds of objects

We can’t obtain an instance of collections and repositories the same way.

Collections store state thus they should be treated as newable objects; repositories, on the other hand, are stateless, so they are injectable objects.

This distinction is important because it tells us that collections have to be instantiated through a factory, while repositories can be instantiated through constructor-based dependency injection.

Additional information about this topic can be found in this article.

More on the state of collections and repositories

Let’s see why collections store state while repositories almost don’t.

Collections can be seen as a kind of query builders even if they don’t strictly adhere to the builder pattern.

To build a query, a collection is initialized by setting, for example, the properties we want to select and the filters we want to apply to the data that will be retrieved.

When the load() method is called, the state is used to build the query.

Let’s see a code fragment taken from the load() method implemented in the \Magento\Eav\Model\Entity\Collection\AbstractCollection base class:

<?php
abstract class AbstractCollection 
    extends AbstractDb 
    implements SourceProviderInterface
{
    // ...
    public function load($printQuery = false, $logQuery = false)
    {
        if ($this->isLoaded()) {
            return $this;
        }
        // ...build query and load data...
        $this->_setIsLoaded();
        // ...
        return $this;
    }
    // ...
}

The load() method builds up the query and fetches the results into the collection’s internal state.

Subsequent calls to the load() method don’t re-fetch data but simply return the collection, unless the state is reset by calling the clear() method.

☝ Calling the load() method is not necessary because it is called by the getIterator() method, which is automatically called as soon as we iterate the elements of the collection.
The getIterator() method is declared in the IteratorAggregate interface which every Magento collection implements.
This mechanism is called lazy data loading, and it has some advantages:
- it allows us to modify the state of a collection until data is loaded for the first time;
- if a collection is never iterated, queries are not executed at all.

Repositories, conversely, are stateless service objects. Repositories can have a state, but since it doesn’t affect the result, we can think of them as stateless objects.

The state I’m referring to is a caching layer that repositories can implement for performance optimization.

Take, for example, the \Magento\Catalog\Model\ProductRepository class: the getById() method saves product instances for subsequent retrieval, as shown below.

public function getById($productId, $editMode = false, $storeId = null, $forceReload = false)
{
    $cacheKey = $this->getCacheKey([$editMode, $storeId]);
    if (!isset($this->instancesById[$productId][$cacheKey]) || $forceReload) {
        $product = $this->productFactory->create();
        // ...
        $this->cacheProduct($cacheKey, $product);
    }
    return $this->instancesById[$productId][$cacheKey];
}

Strictly speaking, this means keeping a state; that’s why I tend to say that repositories are almost stateless.

Pros and Cons

Let’s recap the pros and cons of collections and repositories.

PROs

  • Repositories make it easier to access data.
  • Repositories allow changing of the underlying data access layer without affecting the code that uses them.
  • Repositories allow to introduce performance improvements, like caching (e.g., the Product repository seen above).
  • Collections give more control on data selection and filtering.

CONs

  • Repositories give less control on data selection and filtering.
  • Collections are tight to persistence layer, more coupled, and less easy to replace.

Should we always implement a repository for custom entities?

This question is a pretty frequent one when developing our entities.

My answer is yes, unless the entity we are introducing is an extension attribute of the main entity.

Since the extension attributes’ values are likely injected in the main entity, there is no need to develop a specific repository for the custom entity.

In the Domain-Driven Design jargon, we would say that we don’t need a repository for children entities of an aggregate (the main entity) because “aggregates are the basic element of transfer of data storage ~ M. Fowler”. If you are interested in reading a bit more on the topic, you can refer to this article.

Conclusions

We’ve seen that there are pros and cons both in using collections and repositories. Collections are more expressive and more tight to the persistence layer; repositories are less expressive but allow us to decouple from data mapping and benefit from additional caching layers, improving performances.

Knowing the differences between the Magento framework’s elements is crucial to make the proper choice when it’s required.

Post of

COO | Reggio Emilia

Alessandro works at Bitbull as an experienced technical leader devoted to software design, development, and mentoring.
Honored three times with the title of Magento Master and listed among the top 50 contributors in the last years, he is also an active Magento Community Maintainer since 2018 and member of the Magento Association content committee since 2020.