Steven Ly's Developer Portfolio

Accelerating your Ruby On Rails API by 200%

Steven Ly's profile imageSteven Ly
Accelerating your Ruby On Rails API by 200%'s cover image

At Gap Intelligence my team deals with a lot of data. To be specific we have market intelligence data with an historical estimate of over 200 million rows. With all of this data, I realized that we were quickly running into bottlenecks with our API especially as it is one of the main ways our clients consume our data, as well as what powers Gap's internal and external applications and offerings.


For some background, our application is written in Ruby on Rails. For the first iteration of our API, we used Rails's ORM called Active Record. It comes with every rails application and is amazing for fast development, however it comes at the price of performance. In this blog I will show you how to leverage ElasticSearch to speed up querying our data, and maybe yours too.


Below you will find our Controller, Model, and Serializer BEFORE our solution utilizing ElasticSearch

pricings_controller.rb

def index
    @pricings = Pricing

    filtering_params(params).each do |key, value|
        @pricings = @pricings.public_send("by_#{key}", value) if value.present?
    end

    @pricings = @pricings.includes(:merchant).order(order).page(page).per(per_page)

    render json: @pricings, serializer: ::V1::PaginationSerializer
end

private

def filtering_params(params)
    params.slice(:category_name, :part_number)
end


This is a pretty straightforward approach. We are looping through the allowed filtering parameters passed into the index action. For this to work we must have model methods defined, such as by_category_name & by_part_number. In our case, we made model scopes and they can get chained together if multiple parameters get passed in. Below you can see these scopes defined in our pricing model.


We are also utilizing pagination to limit the amount of data the API returns in the JSON response. Lastly, the index action will render the JSON using a serializer. We use the Active Model Serializers (AMS) gem. One of the reasons we chose to use AMS is to take advantage of the JsonApi Adapter, which allows us to easily format our JSON using the JSON API specification (jsonapi.org/format). In the controller code, we are going through a PaginationSerializer which is shared by all serializers. It dynamically figures out which model serializer to use based on the model. In this case, the Pricing Serializer you see below.


pricing.rb

class Pricing < ActiveRecord::Base
    scope :by_category_name, -> (categories) { joins(:category).where("LOWER(categories.name) IN (?)", categories.downcase.split(',')) }
    scope :by_part_number, -> (part_numbers) { joins(:product).where("LOWER(products.part_number) IN (?)", part_numbers.downcase.split(',')) }
   ...
end

pricing_serializer.rb

class V1::PricingSerializer < V1::BaseSerializer
    attributes :id

    attributes :date_collected
    attributes :shelf_price

    attributes :in_stock
    attributes :merchant
    attributes :product

    def merchant
      object.merchant.name
    end

    def product
      object.product.name
    end
end


So our API works and returns us the data we need. Awesome! However, our pricing model has millions of records in it. This means when we query against the database it can be quite slow, especially when paginating deeper into the data set. This is where you will see some really slow response times. Performance is critical to any API. You want your API to return data quickly. With performance being so important we decided to give ElasticSearch a shot.

We decided to use a gem called Searchkick. Searchkick is a gem that runs on top of ElasticSearch and makes it simple to search your ElasticSearch data in a Rails-like fashion.

Let's Get Started!

First things first, we need to decide what data we are going to store in the search index. In Searchkick you must implement the search_data method in your model. For example in the pricing model:


pricing.rb

def search_data
    {
        category_name: category.name.downcase,
        part_number: part_number
    }
end

After you have defined what data is indexed, you simply call Pricing.reindex to index your data. Keep in mind every time you change the search_data method you must reindex the data. One way to speed up the indexing is to eager load your associations. In Searchkick you can define the search_import scope like this:

pricing.rb

scope :search_import, -> { includes(:merchant, :category, :product) }

OK, we have our data indexed. Now how do we query ES to get the data we need? Searchkick comes with some nice & easy ways of searching the data, but as your searches become more advanced, it's recommended to use the Elasticsearch DSL, which Searchkick conveniently fully supports. So with that let's get into the controller logic.


pricings_controller.rb

def index
    @pricings =
        Pricing.search(
            include: [:category, :merchant, :product],
            query:  query,
            order: order,

            page: page,
            per_page: per_page
      )

    render json: @pricings, serializer: ::V1::PaginationSerializer
end

private

def build_query
    query_array = []

    filtering_params.each do |key, value|
      if value.present?
        query_array << Pricing.public_send("search_by_#{key}", value)
      end
    end

    query_array
end

def filtering_params
    params.slice(:part_number, :category_name)
end

def query
    { bool: { must: build_query } }
end

As you can see there is some familiar code here. We still have the filtering_params method. We are calling slightly different methods as we loop through the parameters. These methods are defined in a model concern and included in the model. For example:


pricing.rb

class Pricing < ActiveRecord::Base
    include PricingSearchable
end

Moving the ElasticSearch logic into a concern

pricing_searchable.rb

module PricingSearchable
    extend ActiveSupport::Concern

    include Searchable

    included do
        after_save :reindex

        scope :search_import, -> { includes(:merchant, :category, :product) }

        def search_data
            {
                category_name: category.name.downcase,
                part_number: part_numbers
            }
        end
    end

    module ClassMethods
        def search_by_category_name(category_names)
            { terms: { category_name: category_names.downcase.split(',') } }
        end

        def search_by_part_number(part_numbers)
            { terms: { part_number: part_numbers.downcase.split(',') } }
        end
    end
end

I like that all the ES logic is in a concern now, which keeps the pricing model cleaner.

As for building the ES query, we are using what is called a Bool Query. This is a query that matches documents matching boolean combinations of other queries. With each boolean clause, you must set a typed occurrence, in this case ‘must’. This means all those query clauses ‘must' appear in matching documents. So for example if I wanted to search by a category of ‘TVs’ and a part number of 123 then only documents which match both conditions are returned in the query results. For more information on the query DSL, visit the ES documentation.

The rest of the parameters we pass to the search method are pretty straightforward. The included parameter is for eager loading of related models. The order parameter is for sorting the result set. Keep in mind the data you want to sort by needs to be added to the ES index. And lastly, we have the pagination parameters page & per_page.


Benchmarking!

For our performance testing, we wanted to see how well the API performed when trying to paginate deep into a large result set. I wrote a script to hit a specific page in the result set and got an average of over 3 requests. Below is a table displaying how many milliseconds each request took and it’s obvious how much it improved when leveraging ES in our API. This was a huge win for us to see these performance gains.


Pagination PageAPI w/ActiveRecordAPI w/ElasticSearch
100895ms110ms
1,0001425ms221ms
1,00001899ms301ms
100,0005106ms2601ms
200,0006620ms3683ms
300,0008282ms3293ms
400,00010202ms3525ms
500,00012515ms3923ms