Database Indexes
Database indexes are separate files that sort specified fields. They impose an overhead in writing data but allow faster readings. Indexes are a powerful feature of most MyISAM and InnoDB databases and should be used with caution - although they can speed up a lot of queries, they can also slow your application down if not correctly used.
If you don’t really get the concept, think of an index in a book. Instead of browsing the whole book looking for something specific, you can just have a look at it’s index. Database indexes work just like that.
Finding Rows
This is the main benefit of indexes - they allow you to find rows quickly. Since they are sorted in a separate file, you won’t have to search the whole dataset. You should index rows that you use in conditions often. If you happen to have a lot of queries of the form:
Person.find_by_name("...")
or much better, as I mentioned in an earlier post:
Person.find(:conditions => ["name = ?", "..."])
Then you should have the Person’s name indexed. That way, all those queries would be snappier than they currently are.
Some storage engines can also use indexes to read the data itself without having to read the row data itself. So if you only fetch the name, in our previous example, it will only need to access the index to retrieve your result.
Person.find(:select => "name", :conditions => ["name = ?", "..."])
This query is really fast even if your database is crowded since rows themselves are never read - only indexes.
Sorting Data
As mentioned before index files already contain sorted data. This also means that ordering those elements is unnecessary - they are already ordered. So if you happen to have a lot of queries of the form:
Person.find(:order => "name")
Then the name row on Person should probably be indexed. This way there would be no overhead in ordering the data since it would be already done.
Understanding Queries
Many databases have an explain command - use it frequently. It will help you identify problematic queries.
Let’s use MySQL for a simple example. Before making name an index, let’s see what is actually happening:
EXPLAIN SELECT name,user,size FROM downloads WHERE user = "username";
+----+-------------+-----------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | downloads | ALL | NULL | NULL | NULL | NULL | 4871 | Using where | +----+-------------+-----------+------+---------------+------+---------+------+------+-------------+
Let’s go over each field to understand what the output is telling us:
- id is the select identifier (and doesn’t really matter)
- select_type is what you’ve done - simple (select without unions or subqueries), primary, union, etc
- table is quite obvious
- type is the join type. Ordered from best to worst, the possible types are: system, const, eq_ref, ref, fulltext, ref_or_null, index_merge, unique_subquery, index_subquery, range, index, ALL
- possible_keys are which indexes MySQL can choose from use to find the rows in this table
- key is the index MySQL actually used
- key_len is the length of the key MySQL actually used
- ref shows which columns of constants are compared to the index in key
- rows indicates the number of rows MySQL thinks it has to examine to execute the query
- Extra has additional information. The fastest possible queries have
Using filesortorUsing temporary.
You should have a look at the manual for further options and explanations on this output.
Let’s add an index to the user field and execute explain on the same query again.
EXPLAIN SELECT name,user,size FROM downloads WHERE user = "username";
+----+-------------+-----------+------+---------------+------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+------+---------+-------+------+-------------+ | 1 | SIMPLE | downloads | ref | user | user | 767 | const | 1 | Using where | +----+-------------+-----------+------+---------------+------+---------+-------+------+-------------+
As you can see our query was optimized and MySQL is now using the user index to find the row we’ve asked. If you look closely, the number of examined rows went down from 4871 to 1!
Common Problems
Indexes make reading faster but they also impose an overhead in writing. Don’t use them when your read/write ration isn’t sufficiently high.
There is also an extra need for space since indexes are stored in separate files.
Be careful when ordering - when you have an index of the form (row1, row2) you can:
ORDER BY row1
ORDER BY row1, row2
ORDER BY row1 DESC, row2 DESC
However, there are a few order instructions that won’t use the index sorting functionalities:
ORDER BY row2
ORDER BY row1, row2 DESC
This happens because the order in the index file does not match with the one requested. Don’t forget that if you create an index of the form (row1, row2) then row1 will be ordered in the file and for each equal value of row1 then row2 will be ordered.
Fragmentation
Your indexes get fragmented from time to time since updates and deletes leave gaps in your index table. Somewhat frequently you should apply optimize to your indexed tables in order to keep your indexes on good health:
OPTIMIZE TABLE tablename
Creating Indexes
All that’s left to do is to create the indexes. You could use MySQL syntax directly, running:
CREATE INDEX name_of_index ON table_name (field);
This, however, is a Rails-oriented blog. When using Rails you should actually be doing this using a migration. Something like this:
class AddIndexToTable < ActiveRecord::Migration
def self.up
add_index :table_name, :field, :unique => true
end
def self.down
remove_index :table_name, :field
end
end
The unique field can obviously be absent. You can also use multi-column indexes, as I mentioned before. For that, just change the add_index line to:
add_index :table_name, [:field, :another_field]
And it’s all good to go.
Make sure you read database specific documentation and have a look at the add_index documentation.