Previous Article | matchIT Hub Index | Next Article |
matchIT Hub includes three versions of a testbed ("HubTest") that can be used for simple testing and demonstration purposes. They take a delimited text file or database table, identifies the duplicate records within it, and outputs the results. (Additionally, two files/tables can be passed in and the records that overlap - i.e. appear in both tables - can be output.)
C++ HubTest (delimited files)
HubTest is a command line program. To see a full list of available arguments, execute:
hubtest /?
hubtest /license=<licenseFile> /settings=<settingsFile> /input=<inputFile1> [/delimiter=<delimiter1>] [/input=<inputFile2> [/delimiter=<delimiter2>]] /output=<outputFile> [/encoding=<encoding>] [/stats=<statsFile>] [switches] Where: <licenseFile> The name of the file containing the activation code. <settingsFile> The name of a matchIT Hub XML settings file (see the Configuration Guide for details). <inputFile1> The name of the input file for a matching process or the name of the first input file for an overlap process. <delimiter1> The field delimiter used in the first or both input file(s) (default ','). <inputFile2> The name of the second input file for an overlap process. <delimiter2> The field delimiter used in the second input file, if different from the first. <outputFile> The name of the output file. <encoding> The character encoding used in all files (default ANSI). <statsFile> The name of a file to write statistics to (XML format). <delimiter> can be any single non-alphanumeric character. <encoding> can be ANSI, UTF8 or UTF16. Switches: /yes Overwrite output file without asking. /header Input files have header record. /help or /? Display usage.
An example of using HubTest:
hubtest /settings=settings.xml /input=contacts.txt /output=results.txt /license=activation.txt "/delimiter=|"
The matchIT Hub configuration settings are supplied in an XML-formatted text file (settings.xml); please refer to the Configuration Guide for details on how to create and customize configuration settings files.
The input data is located in the file contacts.txt; this is a delimited text file that uses a pipe character ('|') as the delimiter. Results will be output to results.txt; these will similarly be delimited.
matchIT Hub requires a valid license. This will usually be supplied as a text file, the contents of which must be read and passed in when a matchIT Hub engine is initialized. HubTest does this automatically, using the specified license file.
C# & Java HubTest (database tables)
The C# and Java versions of HubTest are similar to the C++ version but the inputs and outputs are database tables rather than delimited files. (The C# version works with MS SQL Server, the Java version works with MS SQL Server, Oracle and MySQL).
hubtest /testconfig=<testConfigFile> /settings=<settingsFile> /license=<licenseFile> [/stats=<statsFile>] [switches] Where: <testConfigFile> The name of a HubTest XML configuration file. <settingsFile> The name of a matchIT Hub XML settings file (see the Configuration Guide for details). <licenseFile> The name of the file containing the activation code. <statsFile> The name of a file to write statistics to (XML format). Switches: /help or /? Display usage.
The difference from the C++ command line usage is that, instead of the names of input and output files, these take a <testConfigFile> switch which specifies a configuration file for the database access. E.g.
<?xml version="1.0" encoding="utf-8" ?> <config> <input> <!-- Define one data source for single table Matching --> <dataSource> <connectionString>database connection string</connectionString> <table>TABLE</table> <columns>UniqueRef,Prefix,Forenames,Surname,Address1,Address2,Address3,Address4,Address5,Postcode</columns> </dataSource> <!-- Define two data sources for Overlap Matching --> <!-- <dataSource> </dataSource> --> </input> <!-- Output database in which to create tables for MatchingPairs, GroupedMatchingPairs, MatchingGroups, DedupedData, DuplicateData, DeletedPairs. Only the output types enabled in the matchIT Hub configuration file will be output. --> <output> <connectionString>database connection string</connectionString> </output> </config>
Sample Source Code
matchIT Hub includes full sample source code for all three versions of the HubTest program.
To access the sample code, open the Start menu and navigate to matchIT Hub -> Install Sample Files. Clicking this runs an installer that extracts the sample code to:
C:\Users\<username>\Documents\matchIT Hub\sample code
where <username> specifies the user account of the current user. Windows Explorer will automatically open this folder.
The sample code is supplied in three versions:
- The C++ version reads data from a delimited text file, and writes results to a separate delimited text file;
- The C# version (for Windows only) uses ADO.NET to read from and write to MS SQL Server tables;
- The Java version uses JDBC to read from and write to MS SQL Server, Oracle or MySQL Database tables.
Note that these programs aren't intended to be used in their current form in a production environment, but they can, of course, be used as a starting point to help in the development of an application that uses matchIT Hub.
Previous Article | matchIT Hub Index | Next Article |