Produtos Soluções
Página Inicial Quem Somos Downloads Jornal CoSORT Contate a CoSORT Brasil

CoSORT: The Fastest Way to Join Two Sources

CoSORT is four years ahead of its competition in join technology, and can join two large tables up to 20 times faster than a database can! Matching two pre-sorted files at a time at I/O speed, CoSORT's sort control language (SortCL) program uses familiar SQL syntax to produce a merged output based on specified conditions. In the same job script and I/O pass, you can also manipulate and report the resulting data using additional SortCL functionality, such as:

Remapping (Formatting)
Converting (Translation)
Grouping (Aggregation)

CoSORT's join functionality is widely used in archive versus transaction matching, pattern analyses, changed data capture, and related delta reporting (add-update-delete) processes. CoSORT’s SortCL supports all JOIN keywords used in SQL, including:

SQL Syntax
CoSORT's SortCL
INNER
INNER
LEFT
LEFT_OUTER
RIGHT
RIGHT_OUTER
FULL
FULL_OUTER

In addition, SortCL also offers an UNORDERED join, where all possible combinations of the input files are produced (this does not require pre-sorting). SortCL also offers an ONLY clause which restricts the results of OUTER joins to only the non-matches. Using SortCL, you can also eliminate and reformat null records.

To show the simplicity of translating a SQL join operation into a SortCL script, consider this example:

A LEFT JOIN is used to return all the rows from Employees table, whether or not matches exist in the second table (Orders) over the Employee_ID key. The corresponding rows from Orders are returned only when a match exists with Employees over the Employee_ID:

SELECT Employees.Name, Orders.Product
FROM Employees
LEFT JOIN Orders
ON Employees.Employee_ID=Orders.Employee_ID

A SortCL script to perform the same operation might be:

/INFILE=Employees # left join file
/SPEC=Employees.ddf # metadata for left input file
/INFILE=Orders # right join file
/SPEC=Orders.ddf # metadata for right input file

/JOIN LEFT_OUTER Employees Orders WHERE \
Employees.Employee_ID == Orders.Employee_ID

/OUTFILE=join_results.out
/FIELD=(Employees.Name,POS=1,SIZE=5)
/FIELD=(Orders.Product,POS=7,SIZE=5)

 

The .ddf (data definition file) contains the input file field/column layout, which is produced automatically, and based on the Oracle table description, when using CoSORT's FAst extraCT (FACT) software to extract the table data.

The fastest solution for replicating two-table JOIN operations is offered by an end-to-end FACT ->
CoSORT -> SQL*Loader operation, as described here, where the CoSORT transformation phase would involve a sort piped to a join, and where the smaller join table data could be extracted and sorted separately in anticipation of the piped ETL of the larger table. On an ia64 hp server rx5670 with four 1GHz Itanium2 CPUs and 32GB of main memory, Oracle 9i's SQL*Plus joined two tables, each approximately 1GB, in 48 minutes. By unloading the same tables with CoSORT's FACT, piping these to flat stream sorts and joins in SortCL, and then piping the result into SQL*Loader, the same table was built in 18 minutes, almost 1/3 the time.

 


© 2007 CoSORT Brasil / IRI Innovative Routines International, Inc.
mkt@cosort.com.br | Aviso Legal