Rails views to ReactJS

Have you ever been introduced to a Rails application having JS/jQuery scattered around app/assets/javascript or app/views or even in app/helpers directory for that matter ? Wondering if there is any consistency present while adding any JS snippet, you think would it be better to have some conventions to follow in here too. Whilst there are lot of other options available to help us out, this blog is about using react-rails to bootstrap with ReactJS and then decoupling it from Rails to a complete UI only application.

♦ But why ?

Few reasons that I could think of,

  • Reactiveness, Of course.
  • Performance is certainly better with client-side rendering.
  • View layer testing gets easier because of the component based architecture.
  • Having Rails as an API only application has its own advantages like better performance (by removal of some middlewares), well documented APIs, faster deployments etc.

♦ Inception

To start with the decoupling, start introducing ReactJS into the application using rails-react gem. But before going further, prerequisite is that you’ve some idea about ReactJS, Redux and JSX.

Usage of react-rails gem is pretty simple. To start off, you could start conversion with one simple Rails view or even a div from the Rails view into a React component.

Lets say you’ve a Rails partial which renders the navigation bar containing some links and username, so creating a NavigationBar component would be like,

$ rails g react:component NavigationBar username:string --es6

which creates app/assets/javascripts/components directory having navigation_bar.jsx in it. After adding its rendering logic, you can replace

render partial: 'navigation_bar', locals: { username: current_user.username }

with

react_component('NavigationBar', username: current_user.username)

So by using above approach, you could start replacing partials and views into React components.

Just to make sure that our goal is to decouple the view layer from Rails, don’t pass too many props which are bound with Rails methods/variables. It would be difficult or rather time consuming to remove such references and repopulate the props values while separation.

♦ Architecture of components

As our final goal is to have well architected and maintainable ReactJS application, we should follow the basic conventions for it.

Assuming you would be using react-redux as a state maintainer, the directory structure that any React-Redux application follows basically has,

  • src/
    • components
    • containers
    • actions
    • actionCreators
    • reducers
    • store.js

So in this scenario also, you could add above directories along with components directory in app/assets/javascripts. You could refer to an example of such architecture.

♦ Using NPM packages over bundled gems

You might have installed some gems for packages like select2, moment.js etc in your Gemfile. But those can’t be used after the decoupling. So a better way is to start using NPM packages over such gems.

To do so, you can start using https://rails-assets.org which converts the NPM packages into respective gems and then adds them to the asset pipeline.

# Gemfile

source "https://rails-assets.org" do
  gem 'rails-assets-moment'
  gem 'rails-assets-select2'
  ...
end

♦ Using rails-api

In time, you would also need to start replacing your controller actions with API based controller actions. If your application is using Rails 5, then it has the builtin support for ActionController::API (a class extracted from ActionController::Base with minimal requirements supporting API actions), but for applications with Rails < 5, you would need to install rails-api gem.  Once the ActionController::API is available, you could add app/controllers/api/v1 directory to start with API based controller actions.

Note that while initialisation of your application, Rails requires all the middlewares by default using require 'rails/all' in config/application.rb. But after the decoupling, we won’t be needing all the middlewares. So do remember to remove require 'rails/all' and keep only the required middlewares.

BTW, inheriting controllers from ActionController::API won’t process requests through additional middlewares (like for rendering the views), so you don’t have to worry if you’ve decided to keep all the middlewares.

♦ The Decoupling

Say you’ve reached a level in your application where

  • Views has only single line for rendering the specific React component
  • No more partials
  • No more view helpers
  • No more HAML, ERB or any other Rails template code in view layouts
  • No more controller actions redirecting or rendering the Rails views.

then your application is in the right place to start the decoupling.

You can use create-react-app or any other boilerplate template to create a React application and start copying the required directories from app/assets/javascripts to it.

Some points to be considered after the migration,

  • Add the dependencies that you’ve mentioned in Gemfile under https://rails-assets.org group into the package.json.
  • You would need to add authentication module in your main component as that was handled by Rails previously.
  • Finally, add top-level components inside BrowserRouter (or any other navigational component) which are rendered in Rails views and then remove app/views.

Your ReactJS application will be up in no time and the best thing is that there won’t be any downtime required while switching from Rails views to ReactJS on your servers !!

Thanks for reading, Happy decoupling !!

Advertisements

Searching on steroids

There are plenty of indexing/search servers available out there like Solr, Sphinx, ElasticsearchGoogle Search Appliance and many more.

But out of all the above, Elasticsearch is gaining more attention to it because of it’s popularity. This post is going to tell you only about it and it’s integration in Rails.

So, how would someone describe what an Elasticsearch is ?

From God’s perspective, it’s an open source, distributed, RESTful, search engine i.e. it has a very advanced distributed model, speaks JSON natively, and exposes many advanced search features, all seamlessly expressed through JSON DSL and REST API’s.

♦ Inception

Standard way of using ES in any application is to use it as a secondary data store i.e. data will be stored in some kind of SQL/NoSQL database and then continuously upserting required documents from it into ES, pretty neat.

Some of us might think why not use database itself as a searching engine over ES, as it involves less work and it has all the features needed ? Answer is – NO, you shouldn’t, because when the data starts hitting roof, the database gives up and it won’t show results in real time.

Then some of us might think why not use Elasticsearch as a primary store ? Answer is again – NO, you shouldn’t (for now at least). Reasons are – it doesn’t support transactions, associations and most of all, there are no ORMs/ODMs yet to support ActiveRecord like features (callbacks, validations, eager loading etc).

There are plenty of gems out there to start with ES, but the one I would prefer are elasticsearch-rails and elasticsearch-model as they provide more customization for querying.

♦ The Architect

Consider an Article model which has many comments and an author.

class Article
  include Mongoid::Document

  field :title
  field :body
  field :status
  field :publishing_date, type: Date
  has_many :comments
  belongs_to :author
end

To map Article in ES, you have to specify its JSON structure as,

class Article
  include Mongoid::Document
  include Elasticsearch::Model    # note this inclusion 

  # ....

  # JSON structure to be indexed
  INDEXED_FIELDS = {
    only: [:title, :publishing_date, :status],

    include: {
      author: {
        only: [:_id, :name]
      },

      comments: {
        only: [:body],
        include: {
          author: {
            only: [:_id, :name]
          }
        }
      }
    }
  }

  # It will get called while indexing an article
  def as_indexed_json(options = {})
    as_json(INDEXED_FIELDS)
  end
end

ES needs to know what datatype a field has and how it should get handled while searching. You can specify both using mappings as,

class Article
  include Mongoid::Document
  include Elasticsearch::Model

  # ...

  mappings do
    indexes :status, index: :not_analyzed
    indexes :publishing_date, type: :date

    indexes :author do
      indexes :_id, index: :not_analyzed
    end

    indexes :comments do
      indexes :author do
        indexes :_id, index: :not_analyzed
      end
    end
  end

  # ...
end

PS : You don’t have to specify mappings for all fields, but for only those who need customization. If you don’t specify mapping for a field, ES will assume its datatype is string and it needs to be analyzed (it’ll analyze the text and break it into tokens).

Sometimes it’s worth to store additional fields in ES which you won’t require while searching, but they are required while building a response. For example, while getting list of the articles, response should also contain author’s avatar URL. If its also stored at ES side, no need to make a database call to get it which would  increase response time.

♦ The Transporter

Elasticsearch::Model also adds support for importing data from database to ES. You may need to import all data at the very beginning or while there are some amendments. If you want to automatically update an index if there are document changes, you need to include Elasticsearch::Model::Callbacks module.

There are few ways by which you can do the importing.

• Standard way

Standard way of importing all data in a collection.

Article.import

You can also add scopes while importing, so that only specific documents will get imported.

Article.published.import

import accepts some other options like, force: true to recreate indexes, refresh: true to refresh indexes, batch_size: 100 to fetch 100 documents at a time from collection.

• Using rake

elasticsearch-rails provides a rake task for importing data. It also accepts same option as import through environment variables.

rake elasticsearch:import:model CLASS='Article' SCOPE='published'

You can use this task if you want to setup a CRON job or if you want to import all collections in one go as,

rake elasticsearch:import:all DIR=app/models

• Custom way

If you want more customization for indexing, you can implement it by yourself. For example, making asynchronous updates on document changes or making multiple operations in single request.

If indexing data is taking more time, then below are some cases which you can consider,

  • Adding proper indexes at database side.
  • Fetching less documents in single database request as it would not acquire much of your RAM.
  • If possible, avoid eager loading of associations.
  • Disabling logs or lowering log level so that unwanted lines like database queries won’t take much of time.

♦ In pursuit of happyness 

Everything is setup, all data is imported, upsert actions are in place. Now, all you want to do is make search happen and smirk at the performance 😏. I won’t strain your read by adding some ES tutorials as there are plenty, rather I’ll be sharing some common scenarios I came across and their analogy with database queries.

• Article.published

Article.search(query: { constant_score: { filter: { term: { status: 'published' } } } }).results

I’ve used constant_score query as it boost up the performance by not caring about the document score. One more thing to note is the use of term query instead of match query, as the query is about exact match and not partial match.

Above query can also be written using filter query as,

Article.search(filter: { term: { status: 'published' } }).results

filter queries are used for simple checks of inclusion or exclusion. They are non-scoring queries which makes them faster than scoring queries. In addition, ES caches such non-scoring queries in memory for faster access if they are getting called frequently. So, you have to be careful about choosing in between query and filter depending upon the use case.

• Article.in(status: [‘published’, ‘draft’])

Article.search(query: { constant_score: { filter: { terms: { status: ['published', 'draft'] } } } }).results

Only difference here with above example is using terms instead of term, as query is about ORing the values.

• Article.where(status: ‘published’, :date.gte => 1.month.ago)

Article.search(query: { constant_score: { filter: { bool: { must: [ { term: { status: 'published' } }, { range: { date: { gte: 'now-1M' } } } ] } } } }).results

• Article.any_of(title: /mcdonald/, body: /mcdonald/) + Author.any_of(first_name: /mcdonald/, last_name: /mcdonald/)

Article.search(query: { multi_match: { query: 'mcdonald', fields: [:title, :body, 'author.*_name'] } }).results

Most of the time, multi_match is used for autocomplete feature. It accepts optional parameters to define your search criteria. For example,

  •  type: :best_fields which searches Captain America in single field rather than captain in one field and america in another.
  • type: :phrase for matching exact sentence order i.e. Iron Man will be matched with a field containing iron man and not man iron.
  • fuzziness: 2 which will allow 2 edits in a query i.e. DeaddPoool will be matched with deadpool.
  • and there are many others

• Article.collection.aggregate([ { ‘$match’: { status: ‘published’ } }, { ‘$group’: { _id: ‘$publishing_date’, count: { ‘$sum’: 1 } } } ])

This query will return published articles count grouped by publishing_date. Its analogy in ES query would be,

Article.search(size: 0, query: { constant_score: { filter: { term: { status: 'published' } } } }, aggs: { grouped_articles: { terms: { field: :publishing_date, size: 0 } } }).aggregations.grouped_articles.buckets

Note the use of size: 0 in query. If you only care about aggregated results and not about query results, then make sure to use this parameter as it’ll remove query results in response.

• Article.count

Article.search(aggs: { articles: { terms: { field: :_type, size: 0 } } }).aggregations.articles.buckets.first.doc_count

You might ask why aggregation for getting count ? That’s because, its fast and ES queries search up to its max window size. Your document count might be greater than it, because of which, you need to use Scroll API till last page to get all documents.

There are lot of things to yet to be learn from ES, and new features are still getting added into it. They are also working on 5.0.0 release which will take less disk space and will be twice as fast, with 25% increase in search performance.

Thanks for reading, Happy searching !!

Houston, we got attacked

Houston, you there ?? We’ve had a problem here.

One of our EC2 instance which has Redis server on it, got hacked out of nowhere.
Just before we knew it is hacked, we were screwing up with Redis configuration and thinking what could go wrong with it.

Houston – Roger that, give us more details.

Well, we’ve recently shifted our Redis server to new EC2 instance. The reason we had to do that was because, our Sidekiq processing got much bigger and we couldn’t afford it alongside the Nginx + Passenger. So we took a call to separate it out.

But while configuring Redis, we think we made some mistake :(.

Houston – What is that ?

Basically, we wanted the Redis to listen to all of our Passenger instances . Though by default, it listens on localhost because it’s  bind directive is set to  127.0.0.1 , it’s possible to listen on multiple interfaces by providing multiple IP addresses like,

bind 192.168.1.100  10.0.0.1

According to that, we set IP addresses of our Passenger instance(s) to listen from. But Redis started complaining about it by saying,

bind: Cannot assign requested address

Why is that Houston ??

Houston – Hmm, I pretend to not read that. What next ?

Seriously Houston ?? 😐 (Actually, I didn’t find any convincing answer / way to overcome this. If you know it, please spread your knowledge through comments !)

Another way to listen on multiple connections is by commenting out bind configuration or setting it to 0.0.0.0. So, we did it and everything worked out nicely after that.

Since few days, Redis has started to act weirdly. I mean, our data has started getting lost at any certain point of time. All of our scheduled or enqueued Sidekiq jobs are getting cleared abruptly !!

Houston – What is Redis working directory’s location ?

We’ve set it to /var/lib/redis/default/ in redis.conf using dir directive.

But, if we run $ redis-cli config get dir in terminal, it’s giving /tmp. WTH !

Houston – Roger, it seems to be a big trouble (which might get double 😮).

(Houston continued …)  It appears to us that, by setting bind directive to 0.0.0.0, you allowed the whole world talk to your Redis server.

It’s obviously not secured, as anyone can issue a CONFIG SET dir directive to it (or any other command for that matter). If setting bind directive to some specific IP address isn’t working, then at least update your firewall to allow only those IP addresses or set password to Redis.

You can set password to Redis using requirepass directive.

Affirmative Houston, that explains it.

So if we block whole world talk to Redis except given IP, how can we test it (or hack it 😏) ?

Houston – Behold (but at your own risk).

You can login to Redis console by,

$ redis-cli -h IP_ADDRESS -a PASSWORD

Or run any Lua script like,

$ redis-cli -h IP_ADDRESS -a PASSWORD --eval "some Lua script"

If you know EC2’s IP address ranges, think of the possibilities you got (you know what I mean 🙂 ).

That’s it. Roger, out. Happy hacking !!

Uniqueness Gotcha!!!

♦ The Problem

Consider the following relation where poll is having many options and each option of the poll must be having a unique description,

class Poll < ActiveRecord::Base
  has_many :options
  accepts_nested_attributes_for :options
end

class Option < ActiveRecord::Base
  belongs_to :poll
  validates_uniqueness_of :description, scope: :poll
end

Now, when trying to create a poll with its options using nested attributes, uniqueness validation is not getting applied due to race condition.

> poll = Poll.new(options_attributes: [ { description: 'test' }, { description: 'test' } ])
> poll.save
=> true
>
> poll.options
=> [#<Option id: 1, description: "test">, #<Option id: 2, description: "test">]

♦ Why it is occurring ?

There is a section in UniquenessValidator class, which mentions that the ActiveRecord::Validations#save does not guarantee the prevention of duplicates because, uniqueness checks are performed at application level which are prone to race condition when creating/updating records at the same time. Also, while saving the records, UniquenessValidator is checking uniqueness against the records which are in database only, but not for the records which are in memory.

♦ Is it a bug ?

Its kind of, because it allows creation of duplicate records and marks them as invalid after creation, not allowing to update them further.

> poll.options.last.valid?
=> false

♦ So, what are the solutions ?

As the doc mentions, you can add a unique index to that column, which guarantees record uniqueness at the database level and throws exception if there are any duplicates while creation/updation.

add_index :options, :description, unique: true
begin
  poll = Poll.new(options_attributes: [ { description: 'test' }, { description: 'test' } ])
  poll.save
rescue ActiveRecord::RecordNotUnique
  poll.errors.add(:options, 'must be unique')
end

Its important that you have to be careful to handle this exception every time when you are creating/updating the records(or you have to write a method which does this for you and calling it every time).

Another solution is, defining a custom validation in parent model which will check for child’s uniqueness,

class Poll < ActiveRecord::Base
  has_many :options
  accepts_nested_attributes_for :options

  validate :uniqueness_of_options

  private

  def uniqueness_of_options
    errors.add(:options, 'must be unique') if options.map(&:description).uniq.size != options.size
  end
end

Thanks for reading.

Tips and Tricks using Git

Earlier, for maintaining Linux kernel, Linus Torvalds and his team were using BitKeeper as SCM system, but they had to opt out for another system because their were some concerns about the OSL and the support for the free version of BitKeeper had been dropped. Torvalds and his team wanted a SCM which is more faster in terms of performance than the existing systems, he wanted it to be fast enough that it can apply a patch in 3 seconds and update its associated metadata, but none of the systems were meeting his needs at that time, so he decided to develop his own.

So, Torvalds and his team started the development of new SCM in April 2005. He wanted it to be named after himself, like Linux. He said that he considers himself as egotistical bastard, so he preferred the name Git (English meaning of git is an unpleasant person 🙂 ) and also, man pages describes Git as the stupid content tracker. But, considering the efficiency of Git, it is order of magnitude times faster than some of the versioning systems and its also influenced by distributed design of BitKeeper. Now, not much of the Git history, I am writing this blog to list few of the Git tricks that I have accidentally met while maintaining some repositories.

♦ cherry-pick

In one of our projects, we had taken out some functionality from one repository and had added it to the other. But whenever their was any bug fix or new feature committed in old repository, which is also related to the new one, we wanted those commits to be present in new repository.

In that case, Git provides one awesome command to pick existing commit(s) and apply those changes on HEAD which is introduced by a new commit.

$ git cherry-pick [ref][ref1^..ref2]

You can pick a commit using its SHA1 from current branch or any other branch present in a repository. In my case, I had to cherry pick commits across the repositories. So, I first created a remote in new repo which was pointing to the old repo using,

$ git remote add old_origin git@github.com:yogesh/old_repo.git
$ git remote -v
origin    git@github.com:yogesh/new_repo.git (fetch)
origin    git@github.com:yogesh/new_repo.git (push)
old_origin    git@github.com:yogesh/old_repo.git (fetch)
old_origin    git@github.com:yogesh/old_repo.git (push)

and then fetched the branch from old_repo in which my changes were present.

$ git fetch old_origin master
remote: Counting objects: 44, done.
remote: Compressing objects: 100% (19/19), done.
remote: Total 44 (delta 5), reused 3 (delta 3), pack-reused 22
Unpacking objects: 100% (44/44), done.
From git@github.com:yogesh/old_repo
 * [new branch]       master     -> old_origin/master

Now, the new repo have all commits in place, I cherry picked the required commits from old_origin/master branch using their SHA1 like,

$ git cherry-pick 769f074
[new_repo 74ad936] Spelling fix
 1 file changed, 2 insertions(+), 2 deletions(-)

PS: You can take a few characters from SHA1, as long as its 4 characters long and unambiguous.

If their are any conflicts present while cherry-picking the commits, Git adds those changes to the staged area and tells you to resolve those conflicts first. You can then either abort those changes using,

$ git cherry-pick –-abort

or, resolve all conflicts and continue the merge using,

$ git cherry-pick –-continue

♦ rebase -i We people hate(or maybe fear 🙂 ) using rebase maybe because it would make our repository history look like a hell, but at some point, we have to use it to rewrite the history and its good if we conquer the power to use it. Git provides this interactive utility to modify, squash, reorder or remove your previous commits and helps to maintain repo history. Once, I encountered a situation where I had to modify some files and recognized that those changes were better suited in a 3rd last commit present from HEAD. So to make those changes, I had to first move branch HEAD to that commit using,

$ git rebase -i HEAD~3

which caused an interactive shell to open in my configured editor listing previous three commits(including HEAD). Git adds a nice documentation their with the list of available commands to use.

pick 2e759e6 Adding support for fsck
pick e731699 Adding #option_attributes argument to option_from_collection_for_select
pick 74ad936 Spelling fix

# Rebase 7b667ec..74ad936 onto 7b667ec
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell

To edit the 3rd commit, I placed edit command before it(Git list commits in the reverse order), saved and existed the editor.

edit 2e759e6 Adding support for fsck
pick e731699 Adding #option_attributes argument to option_from_collection_for_select
pick 74ad936 Spelling fix
$ git rebase -i HEAD~3
Stopped at 2e759e6424bd251182fe8c2dd747e29e06ba5f45... Adding support for fsck

HEAD was now pointing to the 3rd commit. At this moment, Git allows to modify or remove any file, amend commit message and committing those changes with,

$ git commit –-amend
[detached HEAD 3539819] Adding support for fsck
 2 files changed, 27 insertions(+), 2 deletions(-)
$
$ git status
# rebase in progress; onto 7b667ec
# You are currently editing a commit while rebasing branch 'dev' on '7b667ec'.

Now, HEAD was still pointing to the 3rd commit, so to return to the previous state,

$ git rebase –-continue
Successfully rebased and updated refs/heads/dev

Note here that Git changes SHA1 of all those commits which were picked/edited in rebase, so if you have already pushed your commits before making rebase and pushed again the rebased commits, it might happen that your colleagues will see duplicate commits in their log and they may curse you for that. To avoid it, here is the lifetime quote you should never forget, never rewrite your history if its been already shared with others. ♦ stash Most of us have came across the situation where we have made some changes in staging area, didn’t want to commit it and pull new commits from remote repository, but Git don’t allow us to do that. So we stash those changes which pushes staged area to the stack and moves HEAD to the previous commit. I came across a situation where when I stashed my changes and pulled latest changes, it created a large list of conflicts. I didn’t had much time to resolve those conflicts and test again my stashed changes. In such case, Git provides an easier way to reapply stashed changes without causing any conflicts on current branch by applying stashed changes onto new branch,

$ git stash branch stashed_changes
Switched to a new branch 'stashed_changes'
# On branch stashed_changes
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#    modified:   config/deploy/staging.rb
#    modified:   config/deploy/production.rb
#    modified:   lib/short_code.rb
#    modified:   lib/tasks/daily_mailer.rake
#
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#    modified:   Gemfile
#    modified:   Gemfile.lock
#
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (53af809c7a84370baffcd8e395b361cad933b2b6)

Their are lot of such things which are yet to be learned from Git and it have made managing our source code much easier by continuously evolving in last 10 years, if you have also came across some situations where you tried googling for that, you can mention those below in comments. Thanks for reading.