CoSORT:
Record Sort Speed, Best for Data Warehouse Staging
"Almost all processing in the
staging area is either sorting or simple sequential
processing."
Ralph
Kimball, "The Foundations for Modern Data
Warehousing" Intelligent Enterprise Magazine,
Data Warehouse Designer
CoSORT's
parallel coroutine sort engine (which directly
exploits multiple CPUs) is the fastest way to
collate large volumes of data.
Click
for sorting benchmarks on UNIX
and Windows servers.
CoSORT's
record-breaking sort performance is available
throughout the CoSORT suite; i.e. the engine is
central to all of CoSORT's standalone utilities
(including the sort control language, or SortCL,
program), API libraries,
and third-party (plug'n'play) sort replacements.
CoSORT
users can specify any number of fixed and/or floating
key fields and collating sequences. The sorted
results are immediately available for faster database
reloads, cross-table
joins, aggregations,
and other data warehouse processing.
In
a CoSORT merge operation, records from two or
more commonly-formatted input files are folded
together based on the key(s). Input files must
already be ordered on those same keys. And because
of the input files’ presorted disposition, the
merge process is faster than a sort. Click
here if you are interested in CoSORT's join
functionality (sometimes called merge).
Oracle
vs. CoSORT Sort Benchmark
HP
RP5450 server with four (4) Itanium2 CPUs, HP-UX
11i, Oracle 9i
Table-to-Table
SQL*Plus ordered a 50 million-row table into a
new table (SELECT * FROM table ORDER BY column_name)
in 1 hr 38 minutes. CoSORT's identical one-key
sort in a data staging area outside the database
was performed in a piped ETL
(fact | sortcl | sqlldr) operation in 18 minutes
– more than five times faster than Oracle by extracting
to, CoSORTing, and re-loading the sorted flat
file into a new table. Click
here for details.
Table-to-File
/ File-to-File
SQL*Plus ordered a 30 million-row table and wrote
the output to a file. CoSORT sorted the same input
and wrote the same output file more than 7 times
faster:
30
million, 50-byte rows (1.4GB)
CoSORT: 6 mins
Oracle: 44 mins
CoSORT
SortCL script
/INFILE=medload.dat
/FIELD=(MED1,POS=1,SIZE=13)
/FIELD=(MED2,POS=14,SIZE=7)
/FIELD=(MED3,POS=21,SIZE=30)
/SORT
/KEY=MED1
/OUTFILE=medload.sorted
|
Oracle
SQL*Plus Script
set
timing on
set trimspool on
set pagesize 0
set heading off
set feedback off
set termout off
spool joinload.txt
SELECT * FROM medfixload order by med1;
spool off
set timing
|
