Saturday 3 November 2012

Optimal Ordering of table columns in Oracle - Best Practises

Most of the time Developers/DBAs create a table with set of columns on the fly without going into any concept/application requirement/best practises....Recently,I came across a post in Oracle Magazine ( July 2012) by Thomas Kyte about the ordering of column when we create a table in oracle .
( Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books ) abothe

I feel it is worth sharing his post in the mydb forum,it may help us ( architects/developers/DBAs) to get some idea about the ordering of column for a table .

Optimal Ordering
Is there an optimal order for creating columns in a table? The DBA at my shop enforces a standard of putting VARCHAR2 columns last, and the likelihood of nulls is not considered. Is there a real-world benefit to this order?
Here’s a little background first. A row is stored in a manner similar to this in the database block:
[null flag][length][data][null flag][length][data]. . . .

In order to get to the third column in a table, the database must parse through the first two columns (not pointers—the database must read the row byte by byte)— even if the columns are numbers, dates, whatever. The columns will be stored as a length field followed by the data, and the database must parse through them to get to the next column.
So, in general you should put the most frequently accessed columns first for the best performance.
However, there is something to be said for putting the column(s) most likely to be null last, because they will consume zero bytes. If the database hits the end of a row before finding the Nth column, you know that that column is NULL or the default column value (for not-null columns added to the table with fast add in Oracle Database 11g).
However, there is a convention to put primary key columns first in all cases.
However . . . there are too many howevers! And in general, it really isn’t going to matter too much. The bit about putting the most frequently accessed columns first would have the most weight in a data warehouse in which you are scanning over many millions of rows to build the answer (the repeated overhead of parsing over a lot of columns that are not interesting would add up). In an online transaction processing (OLTP) system, where you are processing tens of rows, though, it wouldn’t be as big a deal.
In a strange case involving dozens of columns that are almost always null, putting them last in a table with lots and lots of rows could save you quite a few bytes, but the table would have to be large and the columns would have to be all null and frequently all null.
So, my recommendation is to put the columns in the order that makes the most sense from an aesthetic and design perspective.
As for the DBA’s rule above (VARCHAR2 columns last and the likelihood of nulls not considered), it would do nothing for performance or manageability. In many cases, it could negatively affect performance. Just remember that pretty much all database data is stored in a stringlike fashion:
·         A string will have a null flag and a length byte or length bytes (<=250 characters [1 byte], >250 characters [2 bytes]) followed by the data.
·         A number will have a null flag and a length byte or length bytes (numbers are 0–22 bytes in length) followed by a varying amount of data.
·         A binary_float will consume 5 bytes—a leading null flag and a length byte followed by 4 bytes of floating-point data.
·         A binary_double will consume 9 bytes—a leading null flag and a length byte followed by 8 bytes of floating-point data.
The database must read the length byte to figure out how many bytes of that row constitute that column, read over that column’s data to get to the next length byte, and so on. You might want columns you frequently access to be first in a table, because it will take less CPU to access them. But you’d need to be accessing a lot of rows for this to be true!
Suppose you have a table with 1,000 columns and then you query the first column and the last column and compare the TKPROF reports for each. Listing 4 creates the table and shows the reports for the queries.
Code Listing 4: 1,000-column table created, queried, and reported

declare
        l_create long := 'create table t ( c1 number';
begin
        for i in 2 .. 1000
        loop
                l_create := l_create || ',c'||i||' number default ' || i;
        end loop;
        execute immediate l_create || ')';
end;
/

insert into t (c1, c1000 ) select rownum, rownum from all_objects;

exec dbms_stats.gather_table_stats( user, 'T' );

SELECT C1 FROM T
call       count    cpu      elapsed       disk      query    current    rows
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
Parse         1     0.00       0.00          0          0          0        0
Execute      10     0.00       0.00          0          0          0        0
Fetch      7230     6.41      15.72     414610     420920          0   722790
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
total      7241     6.41      15.72     414610     420920          0   722790

SELECT C1000 FROM T

call       count    cpu      elapsed       disk      query    current    rows
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
Parse         1     0.00       0.00          0          0          0       0
Execute      10     0.00       0.00          0          0          0       0
Fetch      7230     8.66      17.93     421260    3304860          0  722790
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
total      7241     8.66      17.94     421260    3304860          0  722790

In this case, the CPU overhead was partially from the parsing of 1,000 columns and the chasing down of the chained-row piece (because any row with more than 254 columns will be stored in multiple pieces). You can see this row-chaining side effect in the report in the increased query column values, which resulted from the number of buffer gets processed.
If you change the 1,000 columns in Listing 4 to 250 columns to avoid having to chase down the chained rows, you might see the result in Listing 5.
Code Listing 5: Report for table with only 250 columns

SELECT C1 FROM T

call       count    cpu      elapsed       disk      query    current    rows
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
Parse         1     0.00       0.00          0          0          0       0
Execute      10     0.00       0.00          0          0          0       0
Fetch      7230     0.62       0.62       1117      94520          0  722790
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
total      7241     0.62       0.62       1117      94520          0  722790

SELECT C250 FROM T
 
call       count    cpu      elapsed       disk      query    current    rows
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
Parse         1     0.00       0.00          0          0          0       0
Execute      10     0.00       0.00          0          0          0       0
Fetch      7230     0.96       0.97          7      94520          0  722790
————————— ———————  —————     ————————   ————————  ————————  ————————   ——————
total      7241     0.96       0.97          7      94520          0  722790

So, the farther down the CREATE list a column is, the longer it will take to retrieve that column. Having a rule that says, “The longer a column is, the farther down the create table statement it must be” doesn’t seem to make sense. There are arguments for putting nullable columns last (for space savings) and for putting infrequently accessed columns last (for performance) but none that I’m aware of for putting longer columns last.

No comments:

Post a Comment

ZFS

Public Cloud tools comparison