Deep dive N+1 Queries in Graphql Ruby

If you do not know roadmap.sh, you should totally give it a look. it provides a high-level view of what you should learn if you want to increase your skills and knowledge in IT.

One knowledge you should have as a backend engineer is N+1 queries. It is the subject of this article without any surprise.

What it is how they can happen in a Graphql API in your Rails app? How to prevent them and in particular in graphql and ruby? We will try to see answer these in the following article. :)

Oh yes, and if you like to learn or read about Rails, Ruby, databases, and a lot of tech-related stuff :

Keep in Touch

On Twitter : [@yet_anotherDev] (https://twitter.com/yet_anotherDev)

On Linkedin : Lucas Barret

N+1 Queries

Before going to the how, let's understand the what of N+1 queries.

This is not as complicated as it sounds:

Let's say you are developing a SaaS product; your customers are companies; these companies have users.

You have to write a query to get all the companies and their users.

With graphql-ruby, your code will look approximately like this:

app/graphql/types/user_type.rb
module Types
  class UserType < Types::BaseObject
    field :name, String, null: false
  end
end
##app/graphql/types/company_type.rb
module Types
  class CompanyType < Types::BaseObject

    field :id, ID, null: false
    field :name, String, null: false
    field :user, [Types::UserType], null: true

    def user
      User.where(company_id: object.id)
    end

  end
end
##app/graphql/queries/company_query.rb
module Queries
  class CompaniesQuery < BaseQuery
    type [Types::CompanyType], null: true

    def resolve(id)
      Company.all
    end

  end 
end

We have defined a Query that gets all the companies. This query returns an Array of Company, materialized by [Types::CompanyType]. In this CompanyType, we retrieve all the user materialized [Types::UserType].

To test it, you can run a Rspec test like this :

Company Load (0.6ms)  SELECT "companies.*" FROM "companies"
  ↳ app/controllers/graphql_controller.rb:15:in `execute'
User Load (0.4ms)  SELECT "users".* FROM "users" WHERE "users"."company_id" = $1  [["company_id", 1]]
  ↳ app/controllers/graphql_controller.rb:15:in `execute'
User Load (0.3ms)  SELECT "users".* FROM "users" WHERE "users"."company_id" = $1  [["company_id", 2]]
  ↳ app/controllers/graphql_controller.rb:15:in `execute'
Completed 200 OK in 49ms (Views: 0.4ms | ActiveRecord: 19.1ms | Allocations: 8978)

What should trigger your attention is that Companies are queried once, then Users are queried twice. You are doing 2 SQL queries to load the two companies' users.

If you had three users, you would trigger 3 SQL queries for the company' users. And so on for each of your company, 10 companies with 10 users make 100 queries ! We call these N+1 queries, which can happen in graphql and with a REST API.

We can easily understand that this query needs to be more scalable, because it could put too much pressure on our database.

Batch Loading with GraphQL-Batch gem

The graphql-batch gem will enable you to solve this issue even more. First, avoid N+1 by batching and grouping all the same entity's queries into one SQL query. Two allows you to do lazy-loading on your fields, so they will not be queried if not used in a particular query.

In the next part of this article, we will see how to create a Loader for our data to avoid this issue.

It will be specialized to make it easier to understand, but you can create one generic to use with all your model.

So as we said before, we want to gather the list of users from companies.

Let's use something similar to the example in the graphql-ruby repository. If I copy and paste the example of the lib, I will end up with something like this after adapting a bit.

class RecordLoader < GraphQL::Batch::Loader
  def initialize(model)
    @model = model
  end


  def perform(ids)
    @model.where(company_id: ids).each { |record| fulfill(record.company_id, record) }
    ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

This will not work; what will happen differs from what you expect. You will end up with an error!

This is due to the way we fulfill the promise. It is associating an id with the record in a local cache.

@model.where(company_id: ids).each { |record| fulfill(record.company_id, record) }

In other words: we have associated the first user to company_id 1, the second to company_id 2, and so on. To counter that, you must group_by and write something like this.

class ArrayRecordLoader < GraphQL::Batch::Loader
  def initialize(model)
    @model = model
  end

  def perform(ids)
    @model.where(company_id: ids).group_by(&:company_id).each { |key,record| fulfill(key, record) }
    ids.each { |id| fulfill(id, nil) unless fulfilled?(id) }
  end
end

This works very well; executing it will give you what you expect.

You have a promise per id key you give. In this case, it is the company_id, and you cannot fulfill a promise that has been fulfilled. So, in this case, with the following code :

 @model.where(company_id: ids).each { |record| fulfill(record.company_id, record) }

You will fulfill the promise for the first company_id with the first User retrieved. And this is not an array, so it will not work, so the expected type will not be respected, and graphql will tell you that it is not the suitable type.

Conclusion

Through this article, we have seen N+1 queries and that they can happen through any API, GraphQL, or REST SOAP when calling your database.

We have seen how to use graphql-batch and implement a basic Loader. To avoid N+1 and using Lazy loading to put less pressure onto your database is something you want to know.