Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Database \ Oracle 11g statistics collection - collection of multi-column statistics     - Use Python to perform system commands common usage (Programming)

- Camera-based face recognition OpenCV crawl and storage format (Python) (Linux)

- DB2 commonly used scripting sort out (Database)

- Fragment Android developers learning to resolve (Programming)

- RHEL7.0 configuration CD-ROM to the local YUM (Linux)

- Django Signals from practice to source code analysis (Programming)

- Install Oracle database error process of [INS-35172] (Database)

- Redhat 5 prohibit IPv6 (Linux)

- Linux --- manual release system cache (Linux)

- Graphical development environment to build Android under Ubuntu 11.04 (Linux)

- Linux Security and Optimization (Linux)

- Python function arguments * args and ** kwargs usage (Programming)

- MySQL 5.6 database code changes (Database)

- ELKstack log analysis platform (Server)

- Go powerful development server simple example (Server)

- To install the Contiki development toolchain on Ubuntu (Linux)

- Visual Studio Code experience (Linux)

- C language print various graphic (Programming)

- Oracle Linux 5.9 configuration Xmanager (Linux)

- Radius server setup under CentOS (Server)

 
         
  Oracle 11g statistics collection - collection of multi-column statistics
     
  Add Date : 2018-11-21      
         
       
         
  We write SQL statements when sometimes encounter where clause followed by a plurality of conditions, that is, according to the conditions screened multiple columns of data. By default, Oracle will multicolumn selectivity (selectivity) multiplying the selectivity to give where statements such selective variations may make Oracle is not accurate enough, causing the optimizer to make wrong judgments. For example, for automobile manufacturers and car models, in fact, there is a relationship, once you know the car's model, which can determine a car manufacturer. Another example star hotels and hotel price level has a similar correspondence. In order to allow the optimizer to make accurate judgments, to generate accurate execution plans, oracle introduced the concept of multi-column statistics in the 11g database.

Optional: In this case, 1 / unique value

We have a table BOOKS, two columns hotel_id, rate_category, we look at the distribution of data in these two columns:
SQL> select hotel_id, rate_category, count (1) from books
2 group by hotel_id, rate_category
3 order by hotel_id;

HOTEL_ID RATE_CATEGORY COUNT (1)
---------- ------------- ----------
101119943
101239385
101320036
20215106
202210041
20235039

6 rows selected.

Check the data carefully: rate_category hotel_id 10 columns contain only 11, 12 and 13, and hotel_id 20 of this column contains only 21, 22 and 23 (11, 12 and 13 do not contain a). why

What? May be due to the star about the hotel. Hotel 20 is a high priced hotel, and rental grades 11, 12 and 13 is a lower class, and therefore they are not suitable for a high fee hotel. Similarly,

21, 22 and 23 are higher rent levels, so they are not available for this hotel 10 hotels. Moreover, the hotel room is more than 10 hotels in 20 predetermined number.

Create an index on two columns of the table books, and collect statistics on the table.
SQL> create index book_idx1 on books (hotel_id);
Index created.

SQL> create index book_idx2 on books (rate_category);
Index created.

SQL> analyze table books compute statistics;
Table analyzed.

If we want to find a table that meet the conditions of 20 hotels price level is a record 21, the execution plan look like?
SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 20 and rate_category = 21;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 2688610195

-------------------------------------------------- -------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- -------------------------
| 0 | SELECT STATEMENT | | 8296 | 33184 | 47 (3) | 00:00:01 |
| * 1 | TABLE ACCESS FULL | BOOKS | 8296 | 33184 | 47 (3) | 00:00:01 |
-------------------------------------------------- -------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "RATE_CATEGORY" = 21 AND "HOTEL_ID" = 20)

SQL> set autotrace off

SQL> select count (1) from books;

COUNT (1)
----------
99550

SQL> select 99550/8296 from dual;

99550/8296
----------
11.9997589

As can be seen from the above example, oracle chose to go full table scan, determines the number of records is 8296, and I watch it in the real data is 5106, for the entire table 99550 records, it should be used to index. But the oracle did not, because oracle two columns will be considered separately, and the selectivity is calculated hotel_id 1/2, rate_category 1/6, resulting in a statement selectivity is 1/12, which also

We see 8296 (1 * 99550/12) records in the execution plan causes.

In order to allow the implementation of oracle accurate records, we can take two ways
1. Use the package dbms_stats new function create_extended_stats create a virtual column, then on the table to collect statistics.
As follows:
dbms_stats.create_extended_stats ( 'SCOTT', 'BOOKS', '(HOTEL_ID, RATE_CATEGORY)')
The next time you re-collect the statistics table will automatically collect multi-column statistics for your column groups.

2. Direct method_opt specified in the package dbms_stats, when the collection of statistical information, the combination of columns as a single column uses

Here we use the second method
SQL> begin
2 dbms_stats.gather_table_stats (
3 ownname => 'SCOTT',
4 tabname => 'BOOKS',
5 estimate_percent => 100,
6 method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY FOR COLUMNS (HOTEL_ID, RATE_CATEGORY)',
7 cascade => TRUE
8 );
9 end;
10 /

PL / SQL procedure successfully completed.

After collecting the Plan of Implementation End column group statistics, look at the statement
SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 20 and rate_category = 21;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 1484887743

-------------------------------------------------- ---------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- ---------------------------------------
| 0 | SELECT STATEMENT | | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 1 | TABLE ACCESS BY INDEX ROWID | BOOKS | 5106 | 30636 | 19 (0) | 00:00:01 |
| * 2 | INDEX RANGE SCAN | BOOK_IDX2 | 5106 | | 11 (0) | 00:00:01 |
-------------------------------------------------- ---------------------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "HOTEL_ID" = 20)
2 - access ( "RATE_CATEGORY" = 21)

The output clearly shows the index BOOK_IDX2 used. Why now use the index? Note that "Rows" column value (5106) below. Optimization program correctly identified an estimate of the number of combinations of values ​​of the line, rather than the estimated value of the number of separate individual values ​​row.

Of course, for other conditions, oracle can also make accurate judgments

SQL> set autotrace trace exp
SQL> select hotel_id, rate_category from books where hotel_id = 10 and rate_category = 12;

Execution Plan
-------------------------------------------------- --------
Plan hash value: 2688610195

-------------------------------------------------- -------------------------
| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-------------------------------------------------- -------------------------
| 0 | SELECT STATEMENT | | 39385 | 230K | 47 (3) | 00:00:01 |
| * 1 | TABLE ACCESS FULL | BOOKS | 39385 | 230K | 47 (3) | 00:00:01 |
-------------------------------------------------- -------------------------

Predicate Information (identified by operation id):
-------------------------------------------------- -

1 - filter ( "RATE_CATEGORY" = 12 AND "HOTEL_ID" = 10)

In view USER_STAT_EXTENSIONS, you can see the extended statistics defined in the database:
SQL> select extension_name, extension
2 from user_stat_extensions
3 where table_name = 'BOOKS';

EXTENSION_NAME EXTENSION
------------------------------ -------------------- -------------------------------------------------- ----------
SYS_STUW3MXAI1XLZHCHDYKJ9E4K90 ( "HOTEL_ID", "RATE_CATEGORY")
     
         
       
         
  More:      
 
- To install Internet security firewall Raiders (Linux)
- Hackers is how the invasion and control of Things devices? (Linux)
- Java object serialization (Programming)
- Linux iptables firewall settings (Linux)
- Ubuntu installation module Python rq (Linux)
- How to deploy Python Web application: Heroku deployment process complete records (Server)
- Linux System Getting Started Learning: Fix ImportError: No module named scapy.all (Linux)
- MySQL 5.7 perfectly distributed transaction support (Database)
- How Mutt mail client to use cipher text password (Linux)
- PCM audio under Linux (Linux)
- Linux system security configuration (Linux)
- AWR Report (Database)
- Java garbage collection and heap memory layout (Programming)
- printf PHP string operations () built-in function usage (Programming)
- Linux system security check notes on performance (Linux)
- Smooth upgrade to OpenSSH 6.7 Procedure (Linux)
- ThinkPad X201i installation notes Debian 7.5 (Linux)
- Search Linux commands and files - which, whereis, locate, find (Linux)
- Linux Tutorial Share: How to sudo command to define the PATH environment variable (Linux)
- Linux kernel RCU (Read Copy Update) lock Brief (Linux)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.