CoSORT:
Index Sequencing and Pre-Sorted Bulk Loads
Selecting,
sorting, joining, reformatting, and aggregating
data not only prepare it for database repopulation
qualitatively, but quantitatively
as well -- the amount of data back in drops, thereby
reducing load times. For example, a 100-million
row input could be reduced through filtering
down to 10 million.
It
is pre-sorting, however, that can have the most
dramatic impact on bulk database loads. Sorting
a flat file on the primary index key prior to
loading can speed a large direct path load between
2 and 10 times. By precluding the internal database
sort (for indexing), pre-sorting improves the
speed and efficiency relational database load
utilities, including: Oracle
SQL*Loader*, SQL
Server and Sybase
bulk copy (bcp), Inform ix, DB2**,
and Red Brick's PTMU. CoSORTing the data outside
of the RDBMS also takes significant processing
overhead off the database, and reduces temporary
storage requirements during the load.
CoSORT's
Sort Control Language (SortCL)
program also features a reserved output field
name for each target record called SEQUENCER which
lays out (wherever you choose to format it) a
count number on each record, starting by default
at 1. This sequencer field can be started at any
other number, however (e.g. -500, 0, 94312) in
order to customize each process or adhere to database
uniqueness constraints. Again, this index-ordered
output (with or without sequence numbers affixed)
from CoSORT can be loaded by, or piped directly
into, your load utility.
Click
here to read more about using CoSORT tools
to accelerate database loads.
Click here to read more
about how CoSORT's FAst extraCT (FACT) for Oracle
also facilitates SQL*Loader operations.
Click here to see data
warehousing and CoSORT schematics.
*Per
Oracle's Server Utilities Guide, pre-sorting improves
the performance of direct path loads
and queries and
minimizes the temporary storage requirements during
the load. Oracle's internal block management is
vastly improved by pre-sorting; the sorted sustained
rate is roughly twice that of the un-sorted sustained
rate.
**For
IBM DB2 users, the exclusive CoSORT
Load Accelerator (CLA) for DB2 directly replaces
the sort within EEE and ESE loaders on UNIX, cutting
UDB load times in half.