Why Do Array Indexes Start at 0?
Why is the first number accessed with 0 instead of 1? TLDR; With 0, the index matches the offset.
Introduction
Given an array of numbers, why is the first number accessed with 0 instead of 1?
nums = [4, 5, 6]
nums[1] # 5!!
nums[0] # 4!!
Short Answer
An array is a pointer, and the index is used as the offset. The first element of the array is precisely at the pointer's memory location; therefore, the offset is zero. The second memory location is one slot further, hence the 1.
If the index starts at 0, the index matches the offset.
History: It Was Not Obvious
We don't realize now that using the 0 for the first element was not always obvious or standard. FORTRAN, for example, is one of the first programming languages and has 1 as the first index.
There was a war about using 0 or 1. Just like now, when we fight over using spaces or indentations, we defend React versus Vue. I guess that we just like arguing.
Not sure what settled the war, it could have been the success of C, or that many well-known computer scientists argued in favor of 0. For example, Edsger Dykstra wrote an article defending the usage of 0. I recommend reading it. It's short and very interesting.
Long Answer
Let's review the short answer and expand a little bit on it.
"An array is essentially a pointer." What is a pointer?
A pointer is an object that stores a memory address. Instead of storing the value, it stores the location of that value.
What does it mean that an array is a pointer? It means that the variable holding the array stores the location to the beginning of the array, not all the numbers.
For example, how do we store the array from the initial example:
nums = [4, 5, 6]
"An array is essentially a pointer, and the index is used as the offset." What is the offset?
The values inside the array are stored starting where the pointer references. To access the first one, we need to access the pointer's value. We access the pointer plus one more slot to access the second one. The offset is this extra slot in the RAM location.
Therefore, if nums
is the array, consider *nums
the location. The first element is exactly in *nums
location—or *(nums + 0)
—, second one is in *(nums + 1)
, third in *(nums + 2)
. In general, we can say that the elements are in *(nums + offset)
.
If we use 1 to access the first element, the compiler needs to change the 1 to 0 because the first element is at location *(nums + 0)
. This means that the compiler needs to do something like *(nums + index - 1)
.
When we use 0, the compiler can use the number directly as offset: *(nums + index)
.
Hence the final sentence: "If the index starts at 0, the index matches the offset."
0 Is More Elegant
I found a Quora answer explaining that if the index starts at 0, solutions to common problems are more elegant. For example, if we want to build a hash table with "n" slots, we use the module of any number to choose the bucket: key mod n
and that's it, a hash table with a simple function.
Is 1 the First?
This is just a curiosity, but many defenders of 1 argue that it's friendlier for the developer to use 1 instead of 0. Programming languages are made for the developer to read; therefore, it makes sense to favor the developer experience. Yet, does 1 really mean first?
The first. "First" is the word we use in English to refer to the first item in a list. Yet, the word "first" does not come from "one." Instead, it derives from the superlative of "fore," which is "foremost." In other languages, something similar happens; the word for "first" does not derive from the word "one."
Long Story Short
I'd like to finish with this sentence from the second answer to the Stack Overflow question:
It makes compilation easier.
I add: "because the index matches the offset."
If you like this post, consider sharing it with your friends on twitter or forwarding this email to them 🙈
Don't hesitate to reach out to me if you have any questions or see an error. I highly appreciate it.
And thanks to Michal, Bernat and Sebastià for reviewing this article 🙏