W1.1 - Amending the Web Service Configuration XML File – Software Support

matchIT Web Index

At the heart of the findITS2 Web Service is an XML Configuration File - findITConfiguration.xml. By Default this is located in the ‘config’ sub folder root of the matchIT web directory. It is worth noting at this point that, depending on the CRM option you chose during the installation of matchIT web, the configuration file - findITConfiguration.xml will be configured differently.

For example, a Microsoft Dynamics install will come with 3 data sources already structured in the configuration file, with some default field mappings to get you started. The only parts of the file in this case that are left blank for you to fill in are the connectionstring and connectionstringadmin nodes of each data source.

Of course, you may want to map the fields a little differently depending on what fields you want to search on. The complete structure of the configuration file is described below.

Within the root node <config> there can be multiple <datasource> nodes (each of which contains the information for a single source of data, which could comprise a single table of contacts from a single database, multiple tables form a single database, or even multiple tables from multiple databases on the same server), a single <batchdatasource> node (containing settings relating to the CompareRecords() web method) and a single <outputfields> which contains the fields that will be output in the results when using the web method to find matches on a data submission. Data from separate database servers are set up as separate data sources.

Breakdown of a ‘datasource’ element

The opening tag for a datasource element has three attributes as follows

The first two are simply a reference to distinguish between different data sources in the XML file. The id attribute should be an integer and name can be an integer, string or mixture as in the above case. The type attribute specifies the type of database that the datasource will be using – in this case ‘sqlserver’.

The first node within a data source node specifies the connection string for the database that is going to be used when the service queries the data.

The value attribute will need to be updated to contain the database / credentials of your data for user level access. Note that if the data you are going to be using is contained in more than 1 database (on the same server), make sure that all databases in question allow access to the same user – in the above case for example, you would need to make sure that all databases allow access to the user ‘user’ under the password ‘pass’.

Note if you are using a SQL instance that doesn’t use the default port, you can specify the new port as follows as part of the data source property.

In the above example, 1733 is the port number assigned to the SQL Server instance.

In addition to the connection string we have a related value, the connectionstringadmin node.

As you can see this is not like the regular connection string, it is encoded in the XML file. You can generate this string in the matchIT web Admin application, and simply paste the encrypted value back into the XML file. This is covered in more detail in the section Using the Administration Program.

It is also important to note here that the main table of data for the data source in question (which is this case is the ‘contacts’ table, as described below), should exist in the database that you specify in the connection string. Any tables thereafter that are to be joined to the main table of data can have their database specified in the database attribute of the table node (as shown below) if they happen to exist in different database to the one specified in the connection string.

The following node defines the name of the database that is used to create the matchIT web specific database objects.

By default, the name of the database is set to findIT_S2 and it is important to note here that the database will be created if it does not exist, so the user account specified in the previous connectionstringadmin node will need the privileges necessary to create a database. This database will store objects such as the keys table, updates table and error record log table so that these items are not created in the source database. As well as the database, it is also possible to specify a schema attribute, which if not specified will default to dbo.

In the case of a Dynamics Installation, there is another node worth mentioning here, which is the <keyssynchronization type="date" /> node. What this does is changes the method of record update monitoring to use a modified date column (offered as standard in Dynamics) rather than using triggers, which is the default method in a standard install when this node is not specified. If using this node, it is also necessary in the table definitions (below) to specify, for each table, the name of the column to monitor for updates, by using a synccolumn attribute. It is also possible to specify an offset attribute in the keyssynchronization node, to allow for any time zone differences that may cause issues. For example, to use an offset of 5 hours, you would specify offset="-5". At the time of writing, it is only possible to use a column that contains a UTC format DateTime value. Similarly, you can specify an offsetseconds attribute if the time difference is only a matter of seconds, as opposed to hours, or you can enter a value greater than 60 for offset seconds if you need to offset by minutes instead.

The next node defines the tables that contain the data – note the keyssynchronization node has been included in the example below for reference.

</tables>

As you can see, individual tables are contained within the <tables> node. Each individual <table> node can contain various attributes which are described below:

name : The name of the table in the database (Mandatory)

schema : The name of the table schema used in SQL queries, “dbo” by default (Optional)

uniqueref : The name of the unique reference column of the table in question (Mandatory)

join : The name of the table that the table in question needs to join to (Mandatory if the table in question is not classed as the main table of data)

joincolumn : The name of the column in the join table that the table in question will be joined on (Mandatory if a ‘join’ table has been specified).

database : The name of the database that the table in question exists in (Optional – if not specified the database set as the ‘Initial Catalog’ in the connection string will be used).

keystable : A flag to specify that the table in question is the keys table (Mandatory if the table in question is the keys table. Possible values, 0 or 1.

synccolumn : If using the <keyssynchronization type="date" /> setting, you can use this attribute on each table to specify the column that contains the date field to monitor for updates. An example of this is shown in the tables snippet above.

In the code above, the main table of data would be the table called ‘Contacts’, which exists in the database ‘example1’ as no database attribute is specified so the default database in the connection string is used.

The second table that is defined, the table called ‘addresses’, exists in a database called ‘example2’, as set by the database attribute. You can see also here that the addresses table joins to the contacts table (join="contacts") by its own unique reference column (uniqueref="ID") to the ‘ID’ column in the contacts table (joincolumn="ID").

The last table is defined as the keys table. This table will be generated by the administration program. It needs the attribute to specify that it is the keys table (keystable="1") and also needs to join to the main table of data (join="Contacts" joincolumn="ID").

Basically you need to amend the XML in the <tables> node to match the structure of the data in your database. In some situations it may be necessary to specify some conditional columns in the table definitions that records would have to adhere to in order to be counted as a record. The most common of these is a deleted flag column. For exactly this purpose it is possible to define sub nodes within a table node called conditionalcolumn nodes. The following is an example of such a structure.

</table>

In the example above, the contacts table contains a conditional column called deleted for which records must not be equal to (indicated by isequalto="false") the value True. The columns data type is simply defined as either integer or non-integer, in this case being the latter by setting isintegertype="false". Any number of conditional columns can be defined within any table (apart from the keys table for which there are no conditional columns).

The next node is the <fieldmappings> node, which contains definitions for all the mappings of your datafields in the datasource to the datafields in the matchIT® record object. Below is an example of one of the mappings

Each <fieldmapping> node has 2 attributes. The matchITfield attribute is the name of the field in the matchIT record object – do not change this. The columnname attribute is the name field in your database that contains the relevant data to be mapped to the ‘FullName’ field of the matchIT record object. Any matchIT record field for which you don’t think your database contains an equivalent should be left blank, as follows

In place of the matchITfield attribute, it is also possible to use a passthroughfield attribute. Use this attribute in place of the matchITfield attribute when you want to map a column in your data source and return it in queries to the web service, but it doesn’t map specifically to any matchIT fields that are available in the matchIT API.

The next node in the XML is the <settings> node. The settings within this node are what will ultimately affect the resulting output.

The first three nodes within the <settings> node are very straight forward, and are explained below

The value attribute of this node contains an integer, which is the maximum number of records that can be returned for a query (for the datasource in question). For no limit, set this value to 0. If a particular query exceeds the limit, no results are returned and a warning is displayed.

Unlike the maxrecords setting, this will not affect the number of records that are selected by the query. Instead, it enables the user to only receive a selection of records from whatever the query returned. The selection will always start with the highest scoring records. Set this to 0 to return all records.

This attribute sets a limit, 1000 by default, to the number of key synchronization related updates that can occur per interval of the keys sync service check. This prevents lengthy queries relating to large numbers of updates consuming excess resource on the machine.

The value attribute of this node contains an integer, which specifies the length of time (in seconds) of a SQL query timeout. By default it is set to 30 seconds.

This node allows the user to force all queries to include blank values when searching for records. An example would be when submitting a first address line to search on that lacks premise information – If the premise key is specified in the match keys, but is empty, it will not be search on by default. Setting this option to false would force the same query to specifically select candidate records that have a blank premise.

This node (shortened here from the default) is used to determine what keys are output to the keys table. If you know that you are not going to make use of a particular column in searches, then you can save overhead by disabling it. Note that any changes to this setting will require a regeneration of keys to recreate the table schema, and the user will need to make sure that the keys defined in the matchkeysettings.xml file correspond to the columns available in the keys table, determined by this setting.

The matchIT API settings (the matching engine) are kept in a separate XML file, to promote readability of the configuration files. To access this file, you need to update the matchitapisettingspath node also.

These settings allow you to alter things like the constraints on address look up, the scoring system and other settings related to record matching. Any changes to these setting should be done carefully and in consultation with the support team at Syniti.

It is also possible to define a <updatestableschema> node to define a schema name to which updates tables should belong, however this shouldn’t be needed very often. In the cases where it is, the node takes a ‘value’ attribute, the value of which should be the schema name.

The <scoreoutputs> node is used to turn on or off specific score components that get returned with a match when calling FindRecords. By default, only the total, min and max scores are returned. It is possible however to return the constituent values that make up the total score, such as name and address score, simply by setting the corresponding attribute in this node to a value of 1.

Another part of the functionality that is configurable in this section is the implicit LIKE style matching that is implicitly performed by the web service when search with Name or Company data. These searches are enabled by default, and are performed regardless of which key configuration is used from the match key settings. They are configurable through adding the following node to the <settings> node section

</likematching>

The LIKE searching can be switched on or off through the <enabled> node, and the threshold word length limit for performing the searches can be set through the <minlength> node. The wildcardusage value controls whether a wildcard is used when querying the name/company keys.

1: Enabled (default if not set) - same behavior as previous versions before this setting was added, where wildcards are always used. This is the default value.
2: Restricted - wildcards are only used with key values greater than 1 character (so it wont perform a lookup on e.g. "s%").
3: Disabled - wildcards are never used

The LIKE searching can impact performance most noticeably on datasets larger than 100,000, so from a performance perspective, we suggest setting it to false in most cases.

In version 2.1.0 the match keys were moved to a separate configuration file to improve readability of the configuration XML document as well as encourage reusability between configurations, without having to set the same matchkeys in each file.

You can change the location of the new file, matchKeySettings.xml in this node

See section matchKeySettings.xml for more information on the configuration of this file.

It is possible to specify an optional node in the settings section to control whether or not the data that is returned in a query to the web service is the original source data, or a cleaned and normalised version produced by the web service process. By default, this node is omitted from the configuration, and the option is set to true (so a parsed and cleansed version of the source data is returned). If you wish to override this, you can do so by adding the following node to the settings section –

Breakdown of the ‘batchdatasource’ section

This section outlines a special kind of data source that is used purely for the CompareRecords() web method, used for batch comparison. It is a highly trimmed down version of a datasource element, with a few extra settings that are unique to the batch process.

The only sub node within the <batchdatasource> node is the <settings> node, which in turn only contains a limited number of sub nodes for batch settings.

The <matchitapisettingspath> setting, as for the standard datasource, defines the location of the matchIT API settings file that contains the settings for the matchIT API engine. The following setting, <maxclustersize>, determines the maximum number of records that can be submitted to the method. By default this value is 200 – Be aware that the number of comparisons effectively squares with this number, and so processing time and return messages can become quire bloated.

The <minimumscorethreshold> node determines the fraction of the maximum possible score that needs to be achieved in order for the comparison to be classified as a match. For example, a threshold of 0.7 against a max score of 100 will need to score a minimum of 70 to be reported as a match.

The final node, <scoreoutputs>, simply determines the score components that are returned along with the unique refs within a comparison node in the return message of a call to CompareRecords(). To enable a score component, simply set its attribute value to 1, and likewise set it to 0 to disable.

Breakdown of the ‘outputfields’ section

The <outputfields> node simply contains all the fields that are defined in a <fieldmappings> node with a data source, in the following format

Only the fields specified in a field mappings node are available as output nodes. To remove a field from the results output of the web method (namely FindRecords, which returns results in XML format, described in the ‘Web Service’ section), simply comment the node out as follows

Note that there is only one <outputfields> node in the XML file. The output fields apply to all data sources, and cannot be defined per data source. If a mapping for a particular output field is available in one data source but not in another, the one for which a mapping does not exist will simply display as blank in the results.

As with the field mapping nodes, if you wish to specify an output field that you mapped as a pass through field rather than a matchIT field in the field mappings section, use a passthroughfield attribute in place of a matchITfield attribute in the <outputfield> node.

Important Note – It is a good idea once the findITConfiguration.xml file is in the correct state to take a copy of it and back it up somewhere in case the operational one is ever deleted. If this ever happens the copy can simply be placed into the root of the matchIT web directory and loaded through the admin program. Note that the copied file’s permissions will need to allow the ‘Users’ group ‘Read’ and ‘Read & Execute‘ permissions in order for the web service to be able to load the file.

matchIT Web Index

Breakdown of a ‘datasource’ element

Breakdown of the ‘batchdatasource’ section

Breakdown of the ‘outputfields’ section

Related articles