| Introdução
e Descrição
CoSORT's
leadership in the UNIX sort world and in data warehouse
performance has its roots in parallel sorting. CoSORT
became directly involved in decision support systems
12 years ago with the development of its sort control
language (SortCL )
-- the popular 4th generation DDL/DML for flat-file
data integration and staging, mainframe sort migration
and report generation. Since then, the SortCL program
has played a major role as an "ETL engine"
in the world's largest operational data stores (ODS)
and data warehouses via combinatory, single-pass
transformations of very large database (VLDB) extracts
and mainframe flat- files, and through fast pre-sorts
that speed loads. CoSORT's parallel sorts and joins
have been occurring outside of leading ETL
tools like Informatica's as well.
And
now, through IRI membership in Informatica's Developer
Network program, IRI engineers have successfully
integrated CoSORT's award-winning sort technology
directly inside the PowerMart and PowerCenter suites.
IRI's optional upgrade to CoSORT's Advanced External
Procedure (AEP) for the Sorter Tx can ten-fold your
sorting performance, reduce disk and RAM requirements,
and indirectly accelerate downstream joins, aggregations,
and bulk re-loads of all types and sizes of data.
With
this identical AEP, data warehouse architects and
consultants can register a seamlessly integrated
CoSORT sorter Tx within the ETL project environment.
No parameter changes are necessary, nothing new
needs to be learned or done. Post-transformation
mappings are simple and documented as well.
The
CoSORT AEP for Informatica speeds sorting directly,
and speeds aggregation, merging (joins), and bulk
re-loading indirectly. For example, PowerMart/Center
users should aggregate with the sorted ports option.

Benchmarks
e Benefícios
The
following tests were conducted on an IBM p650 with
4 CPUs, running Informatica PowerCenter 6.2. Only
32MB of RAM was allocated for all CoSORT operations.
Fixed-key
ASCII Sorting
Input
Source Size:
Sorted by:
Target: |
26,848,200
bytes
6-byte
key
154,300 records |
268,482,000
bytes
6-byte key
1,543,000 records |
2,684,820,870
bytes
6-byte key
15,430,005 records |
| Informatica
'nSort' * |
8s
|
1m48s |
20m35s |
| CoSORT
AEP |
3s |
16s |
2m1s |
| CoSORT
SortCL |
1s |
7s |
1m19s |
*
Best time for Informatica, using 24MB DTM and 16MB
Sorter memory; jobs failed using more memory. Modifying
PowerCenter to break up the processing into separate
partitions was tried. This "improved"
sort performance" by 2X, but required splitting
the source data into separate files, splitting up
the workspaces, and creating separate targets, which
have to be brought together to create the same results.
Using the 'merge partitioned files' feature concatenates
the sorted files together, resulting in unsorted
output. Thus not only is partitioning cumbersome
and time-consuming, it does not produce comparable,
or useable results.
The
CoSORT AEP uses the same amount of temporary space
as SortCL, which is about the same size as the source
data. Informatica's sort required 2.5 times the
source data. With CoSORT, there is no need to modify
Informatica's Sorter memory. Over-allocating sorter
memory for the native Informatica Sorter ('nSort')
Tx causes the session to fail. It also is time-consuming
to try to tune Informatica to find the
"sweet spot" configuration.
By
contrast, the CoSORT AEP results were achieved with
no tuning whatsoever. It reads resource parameters
from a very basic text file ("cosortrc")
-- the same you might already have in place for
external (flat file) CoSORT (SortCL) processes.
The CoSORT engine uses only the memory it needs
from the system administrator's previously-set resource
'ceiling' and can easily be modified for global
or job use.
Variable-key,
ASCII Sorting with Unique and Stable
Sorted
by:
Target: |
6-byte
key
424 records |
14-byte
key
2,233,343 records |
23-byte
key, 2.6GB
15,237,170 records |
| Informatica
'nSort' w/Aggregator |
2m10s |
14m37s |
1h43m46s |
| CoSORT
AEP |
1m03s |
1m32s |
3m24s |
| CoSORT
SortCL |
27s |
38s |
2m15s |
When
UNIQUE is specified to CoSORT, records with duplicate
keys are removed, not just records which are identical
in their entirety. Similarly, when STABLE is specified,
CoSORT outputs equal-key (duplicate) records in
their input order. Informatica, however, cannot
perform a truly UNIQUE and/or STABLE sort with its
native Sorter Tx. PowerMart/Center users must also
create an Aggregator Transformation, grouping by
the sort key and getting the FIRST() of the rest
of the data. This accounts for much of the timing
penalty, especially as the number of UNIQUE records
(or groups) grows. While aggregators benefit from
pre-sorted data, pre-sorting would violate the desired
test result here. And, because STABLE means you
want the value of the first record encountered,
sorting the data in Informatica without CoSORT will
produce incorrect results.

Exemplos
Here
are example screen shots from the CoSORT AEP for
Informatica:








Plataformas,
Licença de Uso e Suporte
The
CoSORT AEP is now available for all Informatica
PowerCenter and PowerMart users on major UNIX and
Windows platforms. The full CoSORT package, including
SortCL, runs on all these
platforms.
The
CoSORT AEP for Informatica is licensed and supported
by CoSORT/IRI USA. Free evaluations of the CoSORT
Sorter Tx AEP also include the full CoSORT package,
for external sort/report operations and many third-party
sort replacements.
The
one-time price of the CoSORT AEP covers perpetual
use, one full year of support and upgrades, as well
as discounts on the optional full-use CoSORT package.
Current CoSORT package users qualify for Informatica
AEP discounts directly from IRI only.
Click
here for more information and/or to arrange
your free evaluation.

|