ActiveRecord is a great ORM, with a ton of nifty little things that make your life easy when dealing with DB queries. pluck
is one such method. I've been using pluck
for as long as I can remember, and even my second oldest article is on it!
Even after using it for so long, recently I found myself debugging a relatively simple query that used pluck
and realized I missed some key points regarding this sweet little AR method.
The Rails Guides on pluck
is pretty neat, but there are still some things you might miss so I'll try to sum it all up here. If you're just starting out or might have overlooked the documentation, I highly recommend going through this. However, if you're already versed in AR/pluck, feel free to jump to the concluding sections.
What is pluck and how does it work?
Pluck
is an AR query method that selects one or more attributes(or columns) and returns an Array
, without loading the corresponding records. It uses select
internally and triggers a query for the values.
Customer.pluck(:id)
# SELECT customers.id FROM customers
=> [1,2,3]
# or
Customer.pluck(:id, :first_name)
# SELECT customers.id, customers.first_name FROM customers
=> [[1, "David"], [2, "Fran"], [3, "Jose"]]
There are multiple ways to do the above in Rails, pluck
is the most performant out of all in almost all cases.
Basically, you are replacing code that looks like this
Customer.select(:id).map { |c| c.id }
# or
Customer.select(:id, :first_name).map { |c| [c.id, c.first_name] }
to be cleaner and more efficient.
So why is it faster?
pluck
directly converts the data into a Ruby array skipping the construction of AR objects, thus giving it an edge in performance. Here's an article benchmarking pluck vs select vs collect/map.
There are some more points that are well-documented in the guides, so I'll just list them here.
- You are not limited to querying fields from a single table, you can query multiple tables as well.
Order.joins(:customer, :books).pluck("orders.created_at, customers.email, books.title")
pluck
triggers an immediate query, and thus cannot be chained with any further scopes, although it can work with scopes already constructed earlier:
Customer.pluck(:first_name).limit(1)
=> NoMethodError: undefined method `limit' for #<Array:0x007ff34d3ad6d8>
Customer.limit(1).pluck(:first_name)
=> ["David"]
pluck
will trigger eager loading if the relation object contains include values, even if the eager loading is not necessary for the query. As you can see in the example, we don't need the join for customer ids, butpluck
will still go ahead and do it! So be careful when you have eager-loaded associations!
irb> assoc = Customer.includes(:reviews)
irb> assoc.pluck(:id)
SELECT "customers"."id" FROM "customers" LEFT OUTER JOIN "reviews" ON "reviews"."id" = "customers"."review_id"
# to avoid this you can do
assoc.unscope(:includes).pluck(:id)
Now we have seen what's in the guides, let's see some cases that might trip you up!
Plucking jsonb fields in Rails
If you have a jsonb field in your AR object and you want some nested value you can still use pluck
!
user.metadata = {
"location": {
"city": "New York",
# ... additional data
}
# ... other metadata attributes
}
# Using 'pluck' to retrieve the 'city' values from the 'location' key in 'metadata'
cities = user.pluck("metadata -> 'location' ->> 'city'")
=> ["New York"]
Pluck's Overriding Power
Since pluck
uses select
internally. it will override any other selects you do before this. This seems obvious but when writing a complex query one might miss out on this. I'll try to illustrate with a relatively simple example
# Scenario: You want to fetch distinct cities from a user table
# Step 1: Using 'select' with 'distinct' to get unique cities
distinct_cities_query = User.select(:city).distinct
# Step 2: Attempting to use 'pluck' to retrieve the city names
cities = distinct_cities_query.pluck(:city)
# Cities might contain duplicate city names
Plucking with where
Another common issue can arise when using pluck
with where (remember, pluck fires an immediate query). Hence, it can fire some extra queries where it's not required. Thankfully there’s a PluckInWhere
Rubocop you can use to warn you.
# bad
# will trigger 2 queries
Post.where(user_id: User.active.pluck(:id))
# SELECT id FROM users WHERE /* conditions for active users */;
# SELECT * FROM posts WHERE user_id IN (1, 2, 3, ...); -- Array of IDs from the first query
# good
# triggers 1 query only
Post.where(user_id: User.active.select(:id))
# SELECT * FROM posts WHERE user_id IN (SELECT id FROM users WHERE /* conditions for active users */);
When to think twice before using pluck?
When you have already loaded the AR objects into memory there is no need to use
pluck
as it will trigger another query. Rather just usemap
orcollect
.When dealing with large datasets be careful with
pluck
as it will load everything into your memory.
# Instead of using pluck which can consume a lot of memory
# User.pluck(:email).each do |email|
# # Process email
# end
# Use find_each or find_in_batches for batch processing
User.find_each(batch_size: 1000) do |user|
email = user.email
# Process email
end
UPDATE 1:
I recently came across another pluck behavior that surprised me again. This is not per se a pluck
caveat but an ActiveRecord one. But it's very easy to slip up with this using pluck
.
Empty array interpolates as NULL in where condition.
So say you have a query like
User.where("id NOT IN (?)", [1,2,3])
=> SELECT * FROM "users" where id NOT IN (1,2,3)
# but if you pass an empty Array like
User.where("id NOT IN (?)", [])
=> SELECT * FROM "users" WHERE (id NOT IN (NULL))
This happens because where id NOT IN ()
or where id NOT IN ([])
is not a valid SQL syntax! Hence AR slams a NULL
in there, which causes the result set to be empty always because NULL represents an unknown or undefined value. When you compare any value with NULL using standard comparison operators, the result is always unknown (not true or false). This behavior extends to the IN and NOT IN clauses as well.
So what's the right way to do this? And why is this in an article on pluck
?
Let's see how we often use pluck
reported_comment_users = Comment.where(reported: true).pluck(:user_id)
=>[]
User.where('id NOT IN (?)', reported_comment_users)
# The result will be empty instead of all users as you would expect.
=> []
# Better way to handle this
User.where.not(id: reported_comment_users)
Now even if reported_comment_users
is empty we won't break our next query. This is a fairly common way of using pluck and hence I decided to put it here.
In conclusion, pluck is a versatile and powerful method in Rails for efficiently retrieving specific columns from a database, but it's crucial to understand its nuances and limitations. Happy Plucking :D