HelpIT

Batch Processing

The matchIT API – plug-in dedupe and data cleansing components

Batch data cleansing

Often, companies have to deal with large quantities of name and address data in different files. Typical reasons for this include:

  • Loading batches of new data into a data warehouse
  • Merging data from different corporate systems
  • Conversion of data from old systems
  • Company acquisitions/mergers

As well as the usual phonetic and other keying errors, the data is usually structured differently and keyed using different standards. Consequently, you can load data into the wrong fields as well as add duplicates. This can compromise the quality of your data warehouse or marketing database, to the extent that it seriously undermines its value.

The matchIT API allows you to automatically clean and convert data from a variety of different sources. The same routines used by helpIT systems' award-winning standalone matchIT suite package can be used to standardise new batch data and match it effectively against the main database. You perform the database updates in your program that calls the matchIT API, so you remain in complete control - you can discard duplicates, create subsidiary records, export them to a separate file or simply place flags within the global file.