Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Database \ Oracle 11g statistics collection - collection of multi-column statistics     - Linux the best download manager uGet (Linux)

- Scala REPL Shell call (Programming)

- Varnish achieve page jump (Server)

- Git use and interpretation of common commands (Linux)

- Large computer network security policy Experience (Linux)

- Oracle 10g relations with the constraint of column properties NULLABLE (Database)

- CentOS 6.5 set under Oracle 12c at startup (Database)

- Restrict console access to Linux servers to improve security (Linux)

- Shell scripts quickly deploy Tomcat project (Server)

- Snort build a secure Linux server (Linux)

- Linux (CentOS) SSH login without password authentication (Linux)

- RHEL6.4 x86_64 build SVN service (Server)

- MySQL innodb_table_stats table does not exist solution (Database)

- SUSE Linux install Oracle 10g and problem solving (Linux)

- Oracle Linux 6.4 (BOND) dual NIC teaming combat - Annotated (Linux)

- Use Markdown editor for document work under Linux (Linux)

- Linux NFS FTP use (Server)

- Linux installation and error under codeblocks exclude [Ubuntu 10.04] (Linux)

- Nagios (centreon) monitoring LVS (Server)

- Detailed use Zabbix monitoring Nginx (Server)

 
         
  Oracle 11g statistics collection - collection of multi-column statistics
     
  Add Date : 2018-11-21      
         
       
         
  We write SQL statements when sometimes encounter where clause followed by a plurality of conditions, that is, according to the conditions screened multiple columns of data. By default, Oracle will multicolumn selectivity (selectivity) multiplying the selectivity to give where statements such selective variations may make Oracle is not accurate enough, causing the optimizer to make wrong judgments. For example, for automobile manufacturers and car models, in fact, there is a relationship, once you know the car's model, which can determine a car manufacturer. Another example star hotels and hotel price level has a similar correspondence. In order to allow the optimizer to make accurate judgments, to generate accurate execution plans, oracle introduced the concept of multi-column statistics in the 11g database.

Optional: In this case, 1 / unique value

We have a table BOOKS, two columns hotel_id, rate_category, we look at the distribution of data in these two columns:
SQL> select hotel_id, rate_category, count (1) from books
2 group by hotel_id, rate_category
3 order by hotel_id;

HOTEL_ID RATE_CATEGORY COUNT (1)
---------- ------------- ----------
101119943
101239385
101320036
20215106
202210041
20235039

6 rows selected.

Check the data carefully: rate_category hotel_id 10 columns contain only 11, 12 and 13, and hotel_id 20 of this column contains only 21, 22 and 23 (11, 12 and 13 do not contain a). why

What? May be due to the star about the hotel. Hotel 20 is a high priced hotel, and rental grades 11, 12 and 13 is a lower class, and therefore they are not suitable for a high fee hotel. Similarly,

21, 22 and 23 are higher rent levels, so they are not available for this hotel 10 hotels. Moreover, the hotel room is more than 10 hotels in 20 predetermined number.

Create an index on two columns of the table books, and collect statistics on the table.
SQL> create index book_idx1 on books (hotel_id);
Index created.

SQL> create index book_idx2 on books (rate_category);
Index created.

SQL> analyze table books compute statistics;
Table analyzed.

If we want to find a table that meet the conditions of 20 hotels price level is a record 21, the execution plan look like?
SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 20 and rate_category = 21;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 2688610195

-------------------------------------------------- -------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- -------------------------
| 0 | SELECT STATEMENT | | 8296 | 33184 | 47 (3) | 00:00:01 |
| * 1 | TABLE ACCESS FULL | BOOKS | 8296 | 33184 | 47 (3) | 00:00:01 |
-------------------------------------------------- -------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "RATE_CATEGORY" = 21 AND "HOTEL_ID" = 20)

SQL> set autotrace off

SQL> select count (1) from books;

COUNT (1)
----------
99550

SQL> select 99550/8296 from dual;

99550/8296
----------
11.9997589

As can be seen from the above example, oracle chose to go full table scan, determines the number of records is 8296, and I watch it in the real data is 5106, for the entire table 99550 records, it should be used to index. But the oracle did not, because oracle two columns will be considered separately, and the selectivity is calculated hotel_id 1/2, rate_category 1/6, resulting in a statement selectivity is 1/12, which also

We see 8296 (1 * 99550/12) records in the execution plan causes.

In order to allow the implementation of oracle accurate records, we can take two ways
1. Use the package dbms_stats new function create_extended_stats create a virtual column, then on the table to collect statistics.
As follows:
dbms_stats.create_extended_stats ( 'SCOTT', 'BOOKS', '(HOTEL_ID, RATE_CATEGORY)')
The next time you re-collect the statistics table will automatically collect multi-column statistics for your column groups.

2. Direct method_opt specified in the package dbms_stats, when the collection of statistical information, the combination of columns as a single column uses

Here we use the second method
SQL> begin
2 dbms_stats.gather_table_stats (
3 ownname => 'SCOTT',
4 tabname => 'BOOKS',
5 estimate_percent => 100,
6 method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID, RATE_CATEGORY)',
7 cascade => TRUE
8 );
9 end;
10 /

PL / SQL procedure successfully completed.

After collecting the Plan of Implementation End column group statistics, look at the statement
SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 20 and rate_category = 21;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 1484887743

-------------------------------------------------- ---------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- ---------------------------------------
| 0 | SELECT STATEMENT | | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 1 | TABLE ACCESS BY INDEX ROWID | BOOKS | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 2 | INDEX RANGE SCAN | BOOK_IDX2 | 5106 | | 11 (0) | 00:00:01 |
-------------------------------------------------- ---------------------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "HOTEL_ID" = 20)
2 - access ( "RATE_CATEGORY" = 21)

The output clearly shows the index BOOK_IDX2 used. Why now use the index? Note that "Rows" column value (5106) below. Optimization program correctly identified an estimate of the number of combinations of values ​​of the line, rather than the estimated value of the number of separate individual values ​​row.

Of course, for other conditions, oracle can also make accurate judgments

SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 10 and rate_category = 12;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 2688610195

-------------------------------------------------- -------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- -------------------------
| 0 | SELECT STATEMENT | | 39385 | 230K | 47 (3) | 00:00:01 |
| * 1 | TABLE ACCESS FULL | BOOKS | 39385 | 230K | 47 (3) | 00:00:01 |
-------------------------------------------------- -------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "RATE_CATEGORY" = 12 AND "HOTEL_ID" = 10)

In view USER_STAT_EXTENSIONS, you can see the extended statistics defined in the database:
SQL> select extension_name, extension
2 from user_stat_extensions
3 where table_name = 'BOOKS';

EXTENSION_NAME EXTENSION
------------------------------ -------------------- -------------------------------------------------- ----------
SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ( "HOTEL_ID", "RATE_CATEGORY")
     
         
       
         
  More:      
 
- Several reasons MySQL garbled (Database)
- Spring Data JPA @EnableJpaRepositories configuration in detail (Programming)
- Linux common network tools: traceroute routing of scanned (Linux)
- Five kinds of IO Model under Linux (Programming)
- Linux Log (Linux)
- Linux System Administrator Network Security Experience (Linux)
- Use FFmpeg processing high quality GIF (Programming)
- linux raid levels and concepts introduced (Linux)
- MyEclipse configuration necessary (Linux)
- Use Makeself Create installation file (Linux)
- You know the difference between URL, URI and URN among you (Linux)
- Windows 7 hard disk to install Ubuntu 14.10 (Linux)
- The difference between vi and nano (Linux)
- Linux tar compressed exclude a folder (Linux)
- Android in the event delivery and handling mechanism (Programming)
- Fedora network set up simple (Linux)
- Under Ubuntu 15.04 installation JDK8 (Linux)
- Oracle 11g contraction table space error ORA-03297: file contains used data beyondrequested RESIZE value (Database)
- Search Linux commands and files - which, whereis, locate, find (Linux)
- Git bulk delete remote tag (Linux)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.