Why Every User Without a Profile Picture on GitHub Was Yehuda Katz

The bug

In 2013, GitHub experienced a bug that caused every user without a profile photo to be assigned the profile photo of a specific user, Yehuda Katz, who is a well-known software developer and member of the Ruby on Rails core team.

The reason for this was because the Ruby programming language assigns the ID of 4 to nil, which is a special value that represents the absence of a value. Yehuda Katz's user ID on GitHub happened to be 4, and the bug in GitHub's code caused it to assign his profile photo to any user who had not uploaded their own profile photo.

The bug was quickly identified and fixed by GitHub's development team, but it gained widespread attention in the software development community due to its amusing and unexpected nature. Yehuda Katz himself even tweeted about the incident, saying

"Apparently, I'm everyone on GitHub who didn't upload a profile picture. Sorry, world!".

But what exactly is NIL and why does it need an id?

Understanding nil

Everything in an object in ruby and so is NIL.There is ONLY one nil object, with an object_id of 4 (or 8 in 64-bit Ruby), this is part of why nil is special.

hash

In Ruby, every object has an ID that uniquely identifies it within the Ruby process. The ID is an integer value that is assigned by the Ruby interpreter.

NIL is a special Ruby object used to represent an empty or default value. It’s also a falsy value, meaning that it behaves like false when used in a conditional statement.

Why do objects need id?

Objects in Ruby need an ID for several reasons. One of the main reasons is that it allows Ruby to determine if two objects are the same. When two objects have the same ID, it means they are the same object in memory. This is useful when checking if two variables point to the same object, or when comparing objects for equality.

The ID of an object is also used by Ruby's garbage collector to manage memory allocation. When an object is no longer referenced by any part of the program, the garbage collector can identify it using its ID and free up the memory it was using.

The object_id is not stored within the object's data structure, but is instead a value returned by the object_id method that is generated by the Ruby interpreter based on the object's location in memory.

In Ruby, the nil object is a singleton object, which means that there is only one instance of the object created during the lifetime of the Ruby process. When the Ruby interpreter is started, it creates a single instance of nil and assigns it a specific location in memory.

Since the location in memory is always the same for the nil object, the object_id of nil is always the same.

Therefore, every time nil is referenced in the code, it is referencing the same object in memory, and thus has the same object_id of 4. This is why, in the GitHub bug, every user without a profile photo showed Yehuda Katz's photo because the Ruby ID of nil is always 4, and Yehuda Katz's user ID on GitHub happened to be 4 as well.

So which objects have id 0,1,2,3...

hash

But why is nil 4?

This stackoverflow answer does a great job of explaining it!

In MRI the object_id of an object is the same as the VALUE that represents the object on the C level. For most kinds of objects this VALUE is a pointer to a location in memory where the actual object data is stored. Obviously this will be different during multiple runs because it only depends on where the system decided to allocate the memory, not on any property of the object itself.

However for performance reasons true, false, nil and Fixnums are handled specially. For these objects there isn't actually a struct with the object's data in memory. All of the object's data is encoded in the VALUE itself. As you already figured out the values for false, true, nil and any Fixnum i, are 0, 2, 4 and i*2+1 respectively.

The reason that this works is that on any systems that MRI runs on, 0, 2, 4 and i*2+1 are never valid addresses for an object on the heap, so there's no overlap with pointers to object data.

Here MRI stands for "Matz's Ruby Interpreter". On the last part, why these are not valid addresses is that the values 0, 2, 4, and i*2+1 are not valid addresses for an object on the heap is because they are typically aligned on memory boundaries that are used by the underlying computer architecture.

Modern computer architectures typically require that data be aligned on certain memory boundaries in order to access it efficiently. This means that data must be located at memory addresses that are divisible by a specific number (e.g. 4, 8, or 16).

Since 0, 2, and 4 are not divisible by 4, they are not valid addresses for objects on the heap in most computer architectures. Additionally, values of i*2+1 are typically odd, and thus not aligned on even boundaries required by most architectures.

By avoiding these memory addresses, the Ruby interpreter can ensure that there is no overlap with pointers to object data, which helps prevent errors such as invalid memory access or segmentation fault that can occur when accessing memory that is not properly aligned.

Dhaval Singh's Blog