What the pluck?

What the pluck?

A complete guide to using pluck, an ActiveRecord query method

ActiveRecord is a great ORM, with a ton of nifty little things that make your life easy when dealing with DB queries. pluck is one such method. I've been using pluck for as long as I can remember, and even my second oldest article is on it!

Even after using it for so long, recently I found myself debugging a relatively simple query that used pluck and realized I missed some key points regarding this sweet little AR method. docs meme

The Rails Guides on pluck is pretty neat, but there are still some things you might miss so I'll try to sum it all up here. If you're just starting out or might have overlooked the documentation, I highly recommend going through this. However, if you're already versed in AR/pluck, feel free to jump to the concluding sections.

What is pluck and how does it work?

Pluck is an AR query method that selects one or more attributes(or columns) and returns an Array, without loading the corresponding records. It uses select internally and triggers a query for the values.

Customer.pluck(:id)
# SELECT customers.id FROM customers
=> [1,2,3]
# or
Customer.pluck(:id, :first_name)
# SELECT customers.id, customers.first_name FROM customers
=> [[1, "David"], [2, "Fran"], [3, "Jose"]]

There are multiple ways to do the above in Rails, pluck is the most performant out of all in almost all cases. Basically, you are replacing code that looks like this

Customer.select(:id).map { |c| c.id }
# or
Customer.select(:id, :first_name).map { |c| [c.id, c.first_name] }

to be cleaner and more efficient.

So why is it faster?

pluck directly converts the data into a Ruby array skipping the construction of AR objects, thus giving it an edge in performance. Here's an article benchmarking pluck vs select vs collect/map.

There are some more points that are well-documented in the guides, so I'll just list them here.

  • You are not limited to querying fields from a single table, you can query multiple tables as well.
    Order.joins(:customer, :books).pluck("orders.created_at, customers.email, books.title")
    
  • pluck triggers an immediate query, and thus cannot be chained with any further scopes, although it can work with scopes already constructed earlier:
Customer.pluck(:first_name).limit(1)
=> NoMethodError: undefined method `limit' for #<Array:0x007ff34d3ad6d8>

Customer.limit(1).pluck(:first_name)
=> ["David"]
  • pluck will trigger eager loading if the relation object contains include values, even if the eager loading is not necessary for the query. As you can see in the example, we don't need the join for customer ids, but pluck will still go ahead and do it! So be careful when you have eager-loaded associations!
irb> assoc = Customer.includes(:reviews)
irb> assoc.pluck(:id)
SELECT "customers"."id" FROM "customers" LEFT OUTER JOIN "reviews" ON "reviews"."id" = "customers"."review_id"

# to avoid this you can do
assoc.unscope(:includes).pluck(:id)

Now we have seen what's in the guides, let's see some cases that might trip you up!


Plucking jsonb fields in Rails

If you have a jsonb field in your AR object and you want some nested value you can still use pluck!

user.metadata = {
  "location": {
    "city": "New York",
    # ... additional data
  }
  # ... other metadata attributes
}

# Using 'pluck' to retrieve the 'city' values from the 'location' key in 'metadata'
cities = user.pluck("metadata -> 'location' ->> 'city'")
=> ["New York"]

Pluck's Overriding Power

Since pluck uses select internally. it will override any other selects you do before this. This seems obvious but when writing a complex query one might miss out on this. I'll try to illustrate with a relatively simple example

# Scenario: You want to fetch distinct cities from a user table

# Step 1: Using 'select' with 'distinct' to get unique cities
distinct_cities_query = User.select(:city).distinct

# Step 2: Attempting to use 'pluck' to retrieve the city names
cities = distinct_cities_query.pluck(:city)
# Cities might contain duplicate city names

Plucking with where

Another common issue can arise when using pluck with where (remember, pluck fires an immediate query). Hence, it can fire some extra queries where it's not required. Thankfully there’s a PluckInWhere Rubocop you can use to warn you.

# bad
# will trigger 2 queries
Post.where(user_id: User.active.pluck(:id))
# SELECT id FROM users WHERE /* conditions for active users */;
# SELECT * FROM posts WHERE user_id IN (1, 2, 3, ...);  -- Array of IDs from the first query

# good
# triggers 1 query only
Post.where(user_id: User.active.select(:id))
# SELECT * FROM posts WHERE user_id IN (SELECT id FROM users WHERE /* conditions for active users */);

When to think twice before using pluck?

  • When you have already loaded the AR objects into memory there is no need to use pluck as it will trigger another query. Rather just use map or collect.

  • When dealing with large datasets be careful with pluck as it will load everything into your memory.

# Instead of using pluck which can consume a lot of memory
# User.pluck(:email).each do |email|
#   # Process email
# end

# Use find_each or find_in_batches for batch processing
User.find_each(batch_size: 1000) do |user|
  email = user.email
  # Process email
end

UPDATE 1:

I recently came across another pluck behavior that surprised me again. This is not per se a pluck caveat but an ActiveRecord one. But it's very easy to slip up with this using pluck.

Empty array interpolates as NULL in where condition.

So say you have a query like

User.where("id NOT IN (?)", [1,2,3])
=> SELECT * FROM "users" where id NOT IN (1,2,3)

# but if you pass an empty Array like 
User.where("id NOT IN (?)", [])
=> SELECT * FROM "users" WHERE (id NOT IN (NULL))

This happens because where id NOT IN () or where id NOT IN ([]) is not a valid SQL syntax! Hence AR slams a NULL in there, which causes the result set to be empty always because NULL represents an unknown or undefined value. When you compare any value with NULL using standard comparison operators, the result is always unknown (not true or false). This behavior extends to the IN and NOT IN clauses as well.

So what's the right way to do this? And why is this in an article on pluck? Let's see how we often use pluck

reported_comment_users = Comment.where(reported: true).pluck(:user_id)
=>[]

User.where('id NOT IN (?)', reported_comment_users)
# The result will be empty instead of all users as you would expect.
=> []

# Better way to handle this
User.where.not(id: reported_comment_users)

Now even if reported_comment_users is empty we won't break our next query. This is a fairly common way of using pluck and hence I decided to put it here.

GitHub discussion on the same

In conclusion, pluck is a versatile and powerful method in Rails for efficiently retrieving specific columns from a database, but it's crucial to understand its nuances and limitations. Happy Plucking :D