Syniti Match API includes three versions of a sample application ("HubTest") that process data using the API. They take a delimited text file or database table, identifies the duplicate records within it, and outputs the results. (Additionally, two files/tables can be passed in and the records that overlap - i.e. appear in both tables - can be output.)
HubTest is a command line program. To see a full list of available arguments, execute:
hubtest /?
matchIT Hub - 3.2.2.5 - testbed application
Usage:
hubtest /license=<licenseFile> /settings=<settingsFile>
/input=<inputFile1> [/delimiter=<delimiter1>]
[/input=<inputFile2> [/delimiter=<delimiter2>]]
/output=<outputFile> [/encoding=<encoding>]
[/stats=<statsFile>]
[/log=<logFile>] [switches]
Where:
<licenseFile> The name of the file containing the activation code.
<settingsFile> The name of a matchIT Hub XML settings file
(see the Configuration Guide for details).
<inputFile1> The name of the input file for a matching process or
the name of the first input file for an overlap process.
<delimiter1> The field delimiter used in the first or both input file(s)
(default ',').
<inputFile2> The name of the second input file for an overlap process.
<delimiter2> The field delimiter used in the second input file, if
different from the first.
<outputFile> The name of the output file.
<encoding> The character encoding used in all files (default ANSI).
<statsFile> The name of a file to write statistics to (XML format).
<logFile> The name of a log file.
<delimiter> can be any single non-alphanumeric character.
<encoding> can be ANSI, UTF8 or UTF16.
Switches:
/yes Overwrite output file without asking.
/header Input files have header record.
/write=<file> Write the settings used to the specified file.
/meta=<file> Write the metadata to the specified file.
/split Split the output into different files by output type.
/timeStamps Log time stamps and elapsed time between stages.
/maxErrors=<n> Abort if specified error count exceeded (default 0 no limit).
/maxBuffer=<n> Maximum record length (default 32KiB).
/help or /? Display usage.
An example of using HubTest:
hubtest /settings=settings.xml /input=contacts.txt /output=results.txt /license=activation.txt "/delimiter=|"
The Syniti Match API configuration settings are supplied in an XML-formatted text file (settings.xml); please refer to the Configuration Guide for details on how to create and customize configuration settings files.
The input data is located in the file contacts.txt; this is a delimited text file that uses a pipe character ('|') as the delimiter. Results will be output to results.txt; these will similarly be delimited.
Syniti Match API requires a valid license. This will usually be supplied as a text file, the contents of which must be read and passed in when a Syniti Match API engine is initialized. HubTest does this automatically, using the specified license file.
C# & Java HubTest (database tables)
The C# and Java versions of HubTest are similar to the C++ version but the inputs and outputs are database tables rather than delimited files. (The C# version works with MS SQL Server, the Java version works with MS SQL Server, Oracle and MySQL).
hubtest /testconfig=<testConfigFile> /settings=<settingsFile> /license=<licenseFile> [/stats=<statsFile>] [switches] Where: <testConfigFile> The name of a HubTest XML configuration file. <settingsFile> The name of a Match API XML settings file (see the Configuration Guide for details). <licenseFile> The name of the file containing the activation code. <statsFile> The name of a file to write statistics to (XML format). Switches: /help or /? Display usage.
The difference from the C++ command line usage is that, instead of the names of input and output files, these take a <testConfigFile> switch which specifies a configuration file for the database access. E.g.
<?xml version="1.0" encoding="utf-8" ?> <config> <input> <!-- Define one data source for single table Matching --> <dataSource> <connectionString>database connection string</connectionString> <table>TABLE</table> <columns>UniqueRef,Prefix,Forenames,Surname,Address1,Address2,Address3,Address4,Address5,Postcode</columns> </dataSource> <!-- Define two data sources for Overlap Matching --> <!-- <dataSource> </dataSource> --> </input> <!-- Output database in which to create tables for MatchingPairs, GroupedMatchingPairs, MatchingGroups, DedupedData, DuplicateData, DeletedPairs. Only the output types enabled in the Match API configuration file will be output. --> <output> <connectionString>database connection string</connectionString> </output> </config>
Syniti Match API includes full sample source code for all three versions of the HubTest program.
To access the sample code, open the Start menu and navigate to matchIT Hub -> Install Sample Files. Clicking this runs an installer that extracts the sample code to:
C:\Users\<username>\Documents\matchIT Hub\sample code
where <username> specifies the user account of the current user. Windows Explorer will automatically open this folder.
The sample code is supplied in three versions:
- The C++ version reads data from a delimited text file, and writes results to a separate delimited text file;
- The C# version (for Windows only) uses ADO.NET to read from and write to MS SQL Server tables;
- The Java version uses JDBC to read from and write to MS SQL Server, Oracle or MySQL Database tables.
Note that these programs aren't intended to be used in their current form in a production environment, but they can, of course, be used as a starting point to help in the development of an application that uses Syniti Match API.