Before writing this article have seen a lot of optimization with databases, most of them are indexed, using a transaction, what to select what and so on. However, just stay in the reading level, there is little practice, because there is no real experience of the project, everything is on paper. Practice is the sole criterion for testing truth, so wanted to test some performance optimization program on the database, such as index and the like, but do not want to use fake data, so they think can fetch some data to the Internet for analysis, later, through PHP to fetch some data (this blog is about to make up), grab the user data of about 110W after, of course, need statistics about the specific number, so I use the following SQL statement (storage engine I'm using InnoDB ):
SELECT COUNT (*) FROM tbl_name;
However, we find it necessary to run the 14-20s time to see results.
Such spending time in a real environment the user experience is very poor, Imagine, open a page to wait for data to be close to 20s, 20s to mention, even if 3s is very poor, so he wanted to in this regard do optimization.
In MySQL, the daily development of the more commonly used two storage engines MyISAM and InnoDB. One difference between the two is the specific function calculates the number of rows in a table using the count (*).
Because of the specific number of rows MyISAM will save the table, so this code in the MyISAM storage engine performed, MyISAM simply read out the number of lines to save good. Therefore, if a transaction like operations do not use the table, this is the best optimization. However, InnoDB storage engine does not save a specific number of rows in the table, therefore, the implementation of this code in the InnoDB storage engine, InnoDB again to scan the entire table to calculate the number of rows.
Query Optimization command --Explain
To understand query performance in which, first, the need to know the cause of slow queries where the bottleneck. rows explain command displays the core performance indicators, rows, which meant that the number of rows mysql need to scan, the more, the vast majority of large rows certain statements executed quickly. So basically in the statement optimization optimization rows.
First, look at the above statement:
We can see, mysql scan the entire table to execute this query.
In the data table design, I add a unique index, but then there is a statement based on which the number of a field statistics, then adds an ordinary index, when I'll do it again the above SQL statements, we found only 0.2-0.3s time will be able to count the number of rows in the table.
I can not help but shocked, accidentally discovered the optimization approach: in InnoDB, in addition to a unique index, in other fields add a normal index (called secondary indexes) will be able to improve the performance of count (*) function . But this is why? explain it:
Also scan the same number of lines, why add a normal index can improve the performance of so many? We begin to find information and read the documentation to understand the problem.
count (*) to perform the function principle
As different storage engines, count (*) function is executed is different. In the MyISAM storage engine, count (*) function is directly read data table holds the number of rows and return, and in the InnoDB storage engine, count (*) function is to start the memory read data in the table into memory buffer, and then a full table scan to get the number of rows. Plus where conditions in the use count function, the effect of the two storage engine is the same, will scan the entire table to calculate the value of a field has a number of items.
After the index because it is added to give a performance boost, so they expect the index from the perspective of exploration.
According to the definition of official documents on: the index is to help MySQL efficiently get the data structure of the data. We can see that, in essence, is the data structure of the index, add the index's purpose is to improve the efficiency of queries.
The query can use the index to the dictionary analogy, if you want to check the "mysql" word, we will first locate the letter m, then find y alphabet letter m in the following word, and so on, until you find the word mysql can We see it in the first few pages, and then went to the page to get the word more information. Imagine, if there is no index, then you have to read page by page in the dictionary, the efficiency is very low. By using this index is continually narrow the scope of the query to filter out the final result.
Then the database is the same, but obviously use the index in the database to be much more complicated.
Disk access and read-ahead
In general, the index itself is also great, not all stored in memory, so the index is often in the form of index files stored on disk. Then the database when building the index will need to start the disk to read the data, and this time will produce a disk I / O consumption. Each time data is read, go through the seek time, rotational latency, transmission time in three parts. Seek time refers to the disk arm moves to the time required for the specified track, usually in less than 5ms; rotational latency is the disk speed; transmission time is the time to read and write data into memory from disk time, this time short , negligible. With respect to the memory access, I / O access to the consumption of several orders of magnitude. Therefore, the evaluation of a data structure as an index of the merits of the most important indicator is to find asymptotic complexity of the process disk I / O operations number. In other words, the structure of the organization to minimize the index lookup process disk I / O access times.
From the above description that the disk I / O is a very expensive operation, in accordance with the principle of locality of the operating system:
When a data is used, the data is often close to be used immediately.
Computer operating system in this regard to do some optimization, when the first I / O time, not only to read the current data disk addresses into the memory buffer, and the adjacent data are also read into the memory buffer. As a result, I read in the data generated / O a lot less. Because in the database, each data I / O read us a call (page), usually 4k or 8k, that is to say, we read the data on one page, in fact, it occurs only once I / O.
According to the above description, we can initially conclude that increase the performance gap is reflected in the index before and after the disk read process. But before adding a new index, I added a unique index, and later found in mysql, I add a unique index is called a clustered index, and the index is called a secondary index added later, so let's See the difference between a clustered index and secondary indexes.
Clustered index (clustered index) and secondary indexes (secondary index)
Clustered index (clustered index)
Every InnoDB storage engine under the table has a special index is used to save each row of data, called a clustered index. Under normal circumstances, the clustered index is a synonym for the primary key. In InnoDB, mysql are selected clustered index:
If the table is defined in the PRIMARY KEY, then InnoDB will use it as a clustered index;
Otherwise, if you do not define PRIMARY KEY, InnoDB will choose first a unique index constraint as NOT NULL PRIMARY KEY, then InnoDB uses it as a clustered index;
If the PRIMARY KEY or a suitable unique index is not defined in the table. InnoDB will automatically generate an implicit clustered index row ID is included in a composite column. These lines use InnoDB table gives these sort of ID. The row ID is a 6-byte field, and as a new line to a single increment. Thus, according to the row ID data row sorting is physically sorted according to the order of insertion.
How to speed up queries clustered index
Because all rows of data related to a clustered index is stored in one place, so that through the clustered index to access the data lines will be faster. If the table is very large, with storage organization uses a different place to store data and indexes, the clustered index structure will save a lot of I / O operations. (For example, MyISAM uses a file to save the data file is saved and another index record).
Secondary index (secondary index)
In addition to the clustered index all indexes are called secondary indexes. In InnoDB, the secondary indexes each row contains the primary key columns in each row, the auxiliary index points to the primary key. InnoDB uses this primary key to find the rows in a clustered index. If the primary key is long, the secondary indexes will use more space, so the index is useful for storing the auxiliary engine has a shorter length of the primary key.
Therefore it can be concluded:
In the first use of a unique index (u_id) when, InnoDB uses a unique clustered index as an index of the table. In the InnoDB storage engine, count (*) function is to start the memory read data in the table into a memory buffer, and then scan the entire table to get the number of rows. Therefore, the use of a unique clustered index as the index when, InnoDB need to read the 110W of the data into the data buffer, where there have been many times I / O, thus causing a major time-consuming. And after adding a secondary index, mysql when executing a query uses an internal optimization mechanisms: the secondary index to count the number. Secondary index value index is saved, this time only need to read a field, I / O is reduced, performance is improved. Therefore, InnoDB, if the entire number of tables statistical requirements, consider adding a secondary index.