( Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books ) abothe
Optimal Ordering
Is there an optimal order for creating columns in a table? The DBA at my shop enforces a standard of putting VARCHAR2 columns last, and the likelihood of nulls is not considered. Is there a real-world benefit to this order?
Here’s a little background first. A row is stored in a manner similar to this in the database block:
In order to get to the third column in a table, the database must parse through the first two columns (not pointers—the database must read the row byte by byte)— even if the columns are numbers, dates, whatever. The columns will be stored as a length field followed by the data, and the database must parse through them to get to the next column.
So, in general you should put the most frequently accessed columns first for the best performance.
However, there is something to be said for putting the column(s) most likely to be null last, because they will consume zero bytes. If the database hits the end of a row before finding the Nth column, you know that that column is NULL or the default column value (for not-null columns added to the table with fast add in Oracle Database 11g).
However, there is a convention to put primary key columns first in all cases.
However . . . there are too many howevers! And in general, it really isn’t going to matter too much. The bit about putting the most frequently accessed columns first would have the most weight in a data warehouse in which you are scanning over many millions of rows to build the answer (the repeated overhead of parsing over a lot of columns that are not interesting would add up). In an online transaction processing (OLTP) system, where you are processing tens of rows, though, it wouldn’t be as big a deal.
In a strange case involving dozens of columns that are almost always null, putting them last in a table with lots and lots of rows could save you quite a few bytes, but the table would have to be large and the columns would have to be all null and frequently all null.
So, my recommendation is to put the columns in the order that makes the most sense from an aesthetic and design perspective.
As for the DBA’s rule above (VARCHAR2 columns last and the likelihood of nulls not considered), it would do nothing for performance or manageability. In many cases, it could negatively affect performance. Just remember that pretty much all database data is stored in a stringlike fashion:
· A string will have a null flag and a length byte or length bytes (<=250 characters [1 byte], >250 characters [2 bytes]) followed by the data.
· A number will have a null flag and a length byte or length bytes (numbers are 0–22 bytes in length) followed by a varying amount of data.
· A binary_float will consume 5 bytes—a leading null flag and a length byte followed by 4 bytes of floating-point data.
· A binary_double will consume 9 bytes—a leading null flag and a length byte followed by 8 bytes of floating-point data.
The database must read the length byte to figure out how many bytes of that row constitute that column, read over that column’s data to get to the next length byte, and so on. You might want columns you frequently access to be first in a table, because it will take less CPU to access them. But you’d need to be accessing a lot of rows for this to be true!
Suppose you have a table with 1,000 columns and then you query the first column and the last column and compare the TKPROF reports for each. Listing 4 creates the table and shows the reports for the queries.
Code Listing 4: 1,000-column table created, queried, and reported
l_create long := 'create table t ( c1 number';
begin
for i in 2 .. 1000
loop
l_create := l_create || ',c'||i||' number default ' || i;
end loop;
execute immediate l_create || ')';
end;
/
call count cpu elapsed disk query current rows
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
Parse 1 0.00 0.00 0 0 0 0
Execute 10 0.00 0.00 0 0 0 0
Fetch 7230 6.41 15.72 414610 420920 0 722790
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
total 7241 6.41 15.72 414610 420920 0 722790
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
Parse 1 0.00 0.00 0 0 0 0
Execute 10 0.00 0.00 0 0 0 0
Fetch 7230 8.66 17.93 421260 3304860 0 722790
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
total 7241 8.66 17.94 421260 3304860 0 722790
In this case, the CPU overhead was partially from the parsing of 1,000 columns and the chasing down of the chained-row piece (because any row with more than 254 columns will be stored in multiple pieces). You can see this row-chaining side effect in the report in the increased query column values, which resulted from the number of buffer gets processed.
If you change the 1,000 columns in Listing 4 to 250 columns to avoid having to chase down the chained rows, you might see the result in Listing 5.
Code Listing 5: Report for table with only 250 columns
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
Parse 1 0.00 0.00 0 0 0 0
Execute 10 0.00 0.00 0 0 0 0
Fetch 7230 0.62 0.62 1117 94520 0 722790
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
total 7241 0.62 0.62 1117 94520 0 722790
call count cpu elapsed disk query current rows
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
Parse 1 0.00 0.00 0 0 0 0
Execute 10 0.00 0.00 0 0 0 0
Fetch 7230 0.96 0.97 7 94520 0 722790
————————— ——————— ————— ———————— ———————— ———————— ———————— ——————
total 7241 0.96 0.97 7 94520 0 722790
So, the farther down the CREATE list a column is, the longer it will take to retrieve that column. Having a rule that says, “The longer a column is, the farther down the create table statement it must be” doesn’t seem to make sense. There are arguments for putting nullable columns last (for space savings) and for putting infrequently accessed columns last (for performance) but none that I’m aware of for putting longer columns last.
No comments:
Post a Comment