CoSORT:
The Fastest Way to Join Two Sources
CoSORT
is four years ahead of its competition in join technology,
and can join two large tables up to 20 times faster
than a database can! Matching two pre-sorted files
at a time at I/O speed, CoSORT's sort control language
(SortCL) program uses familiar
SQL syntax to produce a merged output based on specified
conditions. In the same job script and I/O pass,
you can also manipulate and report the resulting
data using additional SortCL functionality, such
as:
Remapping
(Formatting)
Converting (Translation)
Grouping (Aggregation)
CoSORT's
join functionality is widely used in archive versus
transaction matching, pattern analyses, changed
data capture, and related delta reporting (add-update-delete)
processes. CoSORT’s SortCL supports all JOIN keywords
used in SQL, including:
SQL
Syntax |
CoSORT's
SortCL |
INNER |
INNER |
LEFT |
LEFT_OUTER |
RIGHT |
RIGHT_OUTER |
FULL |
FULL_OUTER |
In
addition, SortCL also offers an UNORDERED join,
where all possible combinations of the input files
are produced (this does not require pre-sorting).
SortCL also offers an ONLY clause which restricts
the results of OUTER joins to only the non-matches.
Using SortCL, you can also eliminate and reformat
null records.
To
show the simplicity of translating a SQL join operation
into a SortCL script, consider this example:
A
LEFT JOIN is used to return all the rows from Employees
table, whether or not matches exist in the second
table (Orders) over the Employee_ID key. The corresponding
rows from Orders are returned only when a match
exists with Employees over the Employee_ID:
SELECT
Employees.Name, Orders.Product
FROM Employees
LEFT JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID
|
A
SortCL script to perform the same operation might
be:
/INFILE=Employees
# left join file
/SPEC=Employees.ddf #
metadata for left input file
/INFILE=Orders # right
join file
/SPEC=Orders.ddf # metadata
for right input file
/JOIN
LEFT_OUTER Employees Orders WHERE \
Employees.Employee_ID == Orders.Employee_ID
/OUTFILE=join_results.out
/FIELD=(Employees.Name,POS=1,SIZE=5)
/FIELD=(Orders.Product,POS=7,SIZE=5)
|
The
.ddf (data definition file) contains the input file
field/column layout, which is produced automatically,
and based on the Oracle table description, when
using CoSORT's FAst extraCT (FACT)
software to extract the table data.
The
fastest solution for replicating two-table JOIN
operations is offered by an end-to-end FACT ->
CoSORT -> SQL*Loader operation, as described
here, where the CoSORT
transformation phase would involve a sort piped
to a join, and where the smaller join table data
could be extracted and sorted separately in anticipation
of the piped ETL of the larger table. On an ia64
hp server rx5670 with four 1GHz Itanium2 CPUs and
32GB of main memory, Oracle 9i's SQL*Plus joined
two tables, each approximately 1GB, in 48 minutes.
By unloading the same tables with CoSORT's FACT,
piping these to flat stream sorts and joins in SortCL,
and then piping the result into SQL*Loader, the
same table was built in 18 minutes, almost 1/3 the
time.

|