Produtos Soluções
Página Inicial Quem Somos Downloads Jornal CoSORT Contate a CoSORT Brasil
The CoSORT Sorter Tx Replaces PowerMart and PowerCenter v5-7's Sorter Transformation, Speeding Large Sorts up to 10X, plus Merge (Join) and Aggregation Transforms
Introdução e Descrição

CoSORT's leadership in the UNIX sort world and in data warehouse performance has its roots in parallel sorting. CoSORT became directly involved in decision support systems 12 years ago with the development of its sort control language (SortCL ) -- the popular 4th generation DDL/DML for flat-file data integration and staging, mainframe sort migration and report generation. Since then, the SortCL program has played a major role as an "ETL engine" in the world's largest operational data stores (ODS) and data warehouses via combinatory, single-pass transformations of very large database (VLDB) extracts and mainframe flat- files, and through fast pre-sorts that speed loads. CoSORT's parallel sorts and joins have been occurring outside of leading ETL tools like Informatica's as well.

And now, through IRI membership in Informatica's Developer Network program, IRI engineers have successfully integrated CoSORT's award-winning sort technology directly inside the PowerMart and PowerCenter suites. IRI's optional upgrade to CoSORT's Advanced External Procedure (AEP) for the Sorter Tx can ten-fold your sorting performance, reduce disk and RAM requirements, and indirectly accelerate downstream joins, aggregations, and bulk re-loads of all types and sizes of data.

With this identical AEP, data warehouse architects and consultants can register a seamlessly integrated CoSORT sorter Tx within the ETL project environment. No parameter changes are necessary, nothing new needs to be learned or done. Post-transformation mappings are simple and documented as well.

The CoSORT AEP for Informatica speeds sorting directly, and speeds aggregation, merging (joins), and bulk re-loading indirectly. For example, PowerMart/Center users should aggregate with the sorted ports option.

Benchmarks e Benefícios

The following tests were conducted on an IBM p650 with 4 CPUs, running Informatica PowerCenter 6.2. Only 32MB of RAM was allocated for all CoSORT operations.

Fixed-key ASCII Sorting

Input Source Size:
Sorted by:
Target:

26,848,200 bytes
6-byte key
154,300 records

268,482,000 bytes
6-byte key
1,543,000 records
2,684,820,870 bytes
6-byte key
15,430,005 records
Informatica 'nSort' *
8s
1m48s
20m35s
CoSORT AEP
3s
16s
2m1s
CoSORT SortCL
1s
7s
1m19s


*
Best time for Informatica, using 24MB DTM and 16MB Sorter memory; jobs failed using more memory. Modifying PowerCenter to break up the processing into separate partitions was tried. This "improved" sort performance" by 2X, but required splitting the source data into separate files, splitting up the workspaces, and creating separate targets, which have to be brought together to create the same results. Using the 'merge partitioned files' feature concatenates the sorted files together, resulting in unsorted output. Thus not only is partitioning cumbersome and time-consuming, it does not produce comparable, or useable results.

The CoSORT AEP uses the same amount of temporary space as SortCL, which is about the same size as the source data. Informatica's sort required 2.5 times the source data. With CoSORT, there is no need to modify Informatica's Sorter memory. Over-allocating sorter memory for the native Informatica Sorter ('nSort') Tx causes the session to fail. It also is time-consuming to try to tune Informatica to find the
"sweet spot" configuration.

By contrast, the CoSORT AEP results were achieved with no tuning whatsoever. It reads resource parameters from a very basic text file ("cosortrc") -- the same you might already have in place for external (flat file) CoSORT (SortCL) processes. The CoSORT engine uses only the memory it needs from the system administrator's previously-set resource 'ceiling' and can easily be modified for global or job use.

Variable-key, ASCII Sorting with Unique and Stable

Sorted by:
Target:
6-byte key
424 records
14-byte key
2,233,343 records
23-byte key, 2.6GB
15,237,170 records
Informatica 'nSort' w/Aggregator
2m10s
14m37s
1h43m46s
CoSORT AEP
1m03s
1m32s
3m24s
CoSORT SortCL
27s
38s
2m15s

When UNIQUE is specified to CoSORT, records with duplicate keys are removed, not just records which are identical in their entirety. Similarly, when STABLE is specified, CoSORT outputs equal-key (duplicate) records in their input order. Informatica, however, cannot perform a truly UNIQUE and/or STABLE sort with its native Sorter Tx. PowerMart/Center users must also create an Aggregator Transformation, grouping by the sort key and getting the FIRST() of the rest of the data. This accounts for much of the timing penalty, especially as the number of UNIQUE records (or groups) grows. While aggregators benefit from pre-sorted data, pre-sorting would violate the desired test result here. And, because STABLE means you want the value of the first record encountered, sorting the data in Informatica without CoSORT will produce incorrect results.

Exemplos

Here are example screen shots from the CoSORT AEP for Informatica:

Plataformas, Licença de Uso e Suporte

The CoSORT AEP is now available for all Informatica PowerCenter and PowerMart users on major UNIX and Windows platforms. The full CoSORT package, including SortCL, runs on all these platforms.

The CoSORT AEP for Informatica is licensed and supported by CoSORT/IRI USA. Free evaluations of the CoSORT Sorter Tx AEP also include the full CoSORT package, for external sort/report operations and many third-party sort replacements.

The one-time price of the CoSORT AEP covers perpetual use, one full year of support and upgrades, as well as discounts on the optional full-use CoSORT package. Current CoSORT package users qualify for Informatica AEP discounts directly from IRI only.

Click here for more information and/or to arrange your free evaluation.

 


© 2007 CoSORT Brasil / IRI Innovative Routines International, Inc.
mkt@cosort.com.br | Aviso Legal