Alteryx - Advanced Settings – Software Support

The advanced settings dialog is a popup tabbed dialog.

Matching tab

Minimum scores	Specifies a matching threshold score for each matching level enabled.
Maximum cluster size	All processed data is added to clusters. When a record is added to an existing cluster, it's then compared to each record already in the cluster, provided the maximum cluster size has not been exceeded. If the cluster has reached the maximum size, then no more comparisons will be performed on that cluster and it will be logged as a large cluster.

Output Options

Output unique refs only	If enabled, then only unique refs are output. If disabled (default), then the output contains a copy of the input data, which can include the unique ref.
Output component scores	If enabled, then scores for mapped components are output for each matching level in addition to total scores. If disabled (the default), then only the total score for each matching level is output.
Output exact match scores	If enabled, then a total score is output for exact matches that is the sum of the sure score setting for all mapped components plus one. Otherwise the score field is blank for exact matches. Regardless of this setting the component scores for exact matches are always blank.
Output all exact matches	When disabled (the default), matching pairs are only output if a record exactly matches the first record of a cluster. If enabled, then all matching pairs are output.
Output highest scores	If enabled, highest scores are also output to the Grouped Matching Pairs and Matching Groups output types. This is the highest score achieved by any matching pair within each group. (Conversely, the base score is the lowest score achieved by the pairs within a group and is always output.)
Output duplicates count	If enabled, the number of duplicates in each group is output to the Matching Groups output type. This is one less than the number of records in the group.
Output compare results	If enabled, the matching matrices indices and acronym match flag are included in the Matching Pairs output type.

Grouping options

Name bridging prevention	Prevents records with different forenames being grouped together because they all match to a record that is missing forename.
Prefix bridging prevention	Prevents records with ‘Miss’ and ‘Mrs’ being grouped together because they match to a record with ‘Ms’.
Company bridging prevention	Prevents records with different company names being grouped together because they all match to a record with an acronym. (e.g. IBM matches “International Business Machines” and “Injection Blow Moulding”).
Aggressive splitting	If enabled, bridging records will be disassociated from all matching records. If disable (the default), bridging records will remain matched to one sub-group of non-bridging records.
Master record identification	If enabled (default), the master record in each group is chosen according to: Master Priorities rules, then address length, then lowest UniqueRef. If disabled, the master record in each group is simply the record with the lowest UniqueRef.

Match Keys

Match keys determine how records are clustered. When a new record is added to an existing cluster (containing one or more existing records) the record is compared to each of those existing records. Clusters are used to group potentially matching records.

Keys	Lists the keys that will be used to cluster records for matching.
Key types	Keys are grouped into ‘exact keys’ and ‘fuzzy keys’. All the records in a fuzzy key cluster are compared to one another. All the records in an exact cluster are automatically considered matching, without needing to compare.
Key fields	Each key is a combination of key fields, e.g. Address Key + Premise.
Key functions	Functions (such as UPPER, TRIM, etc) can be applied to key fields. Functions are best used with raw input data (names, address lines, postcodes, etc.) rather than with the key fields generated by the Hub engine (NameKey, AddressKey, etc.)
Allow blank keys	Key fields can be marked as 'optional' by enclosing them within square brackets, alternatively enabling ‘allow blank keys’ makes all key fields optional. The two methods cannot be used at the same time.
Dynamic keys	In overlap mode (and lookup mode) enabling this option instructs Matching to dynamically choose which keys to use (from those defined), on a record-by-record basis, depending on which input columns are populated.

Matching Rules

Matching levels	The Matching Rules dialog has one tab for each matching level enabled (Individual, Name only, Family, Address, Business, Company only).
Weights	Weights are used when compared records are scored. Weights are configured automatically when the basic configuration settings are specified (nationality, tightness), there is no need to manually configure these weights unless customizations are being made.
Thresholds	Scoring thresholds can be applied to provide further matching requirements when two records are compared. It is not recommended to change these settings.

Constraints

Must match gender	When enabled, potential matches will be disregarded if their genders differ. However, if the gender is unknown in one or both of the records, the records will potentially be classed as a match.
Must match suffix	When enabled, potential matches will be disregarded if their suffixes differ. However, if the suffix is unknown in one or both of the records, the records will potentially be classed as a match.
Must match joint names	When enabled, potential matches will be disregarded if one record has a joint name but the other doesn't. For example, normal behaviour will match "Mr and Mrs J Smith" with "Mr J Smith"; enabling must match joint names will prevent such matches.
Address constraints	The address matching constraints (must match location, premise, directional, etc) are now implemented via post-matching rules, so do not need to be configured here.

Matching Matrices

Three dimensional matching matrices are used to decide the level of match records should achieve. In the name matching matrix the three dimensions represent the individual name fields: last name, first name, middle name. In the company matching matrix the three dimensions are name1, name2, and name3. The matrix maps the match type for these individual name fields (equal, both_empty, one_empty, sounds_equal, etc.) to an overall match level (sure, likely, possible, etc.).

Post-Matching Rules

Advanced Post-Matching rules are applied to matching pairs prior to grouping. The Advanced Post-Matching rules only apply to fuzzy compared matches. Each rule specifies both a condition using a SQL-like syntax, plus an action that determines what happens when a condition is satisfied.

Conditions	Rule conditions are logical expressions that results in a Boolean (true or false). An expression can be a function – such as “matches(city)” – or a logical operation such as “AddressScore >= 30”, “City == ‘RALEIGH’”. Conditions can consist of a single logical expression or of multiple expressions (combined using “and”, or “or”).
Actions	Rule actions are either "Keep" or "Delete". If any successful rule specifies a Keep action, then the match is kept. If any successful rule specifies a Delete action, then the match is deleted, but only if the match isn’t being kept.

Master Priorities

Master priority rules are used to determine which record in a matching group should be marked as the master record (i.e. the best record).

Word Lookup

The Names and Words tables (NAMES.DAT & NAMES2.DAT) control:

the matching equivalent of words e.g. Tony = Anthony
the gender of forenames e.g. John = Male, Susan = Female, Chris = Either
casing rules e.g. PO Box, IBM, 360Science
expansion/contraction of abbreviations and correction of typing errors e.g. Svcs = Services, Finacial = Financial
attributing type to these and other words e.g. Mr = Prefix, Ltd = Business, FL = State, The = Noise.

Generate tab

Generate name options

Default gender	The Default Gender property is the gender to assume when the matchIT API can't determine whether the name is male or female e.g. Chris Smith, C Smith.
Use equivalent name	If enabled, the input first name is replaced with its equivalent from the NAMES.DAT file.
Enhanced double barrelled lookup	When enabled, this setting will cause an unrecognised middle name to be considered part of a non-hyphenated double-barrelled last name.
Process blank last name	With this setting enabled, a blank lastname will cause extra processing to be performed on other input data to help detect typographical errors.
Parse name elements	When enabled, this will cause input name elements (including prefix, firstnames, and lastname) to be parsed.
Detect inverse names	Attempt to identify addressee names that have been specified with the lastname preceding the firstnames, provided a comma delimiter follows the lastname (for example, "Smith, John" where Smith is the lastname).
Parse as normalized name	Addressee names are assumed to be in a delimited normalized format.

Generate company options

Use equivalent name	If enabled, then the equivalent (according to the NAMES.DAT file) of words indicating a business name, such as "Motors" or "Services" are included in the normalized organization name and the corresponding phonetic keys.
Normalization truncation	If enabled (non-zero), and the organization consists of more than three words, then the third element of the normalized organization name will be truncated to the first N characters of each word after the first two (where N is the value of this setting).
Ignore parentheses	With this property enabled, any words that are enclosed with parentheses within an organization name will be excluded from the phonetic organization keys.
Ignore trailing post town	Exclude any trailing post town from the phonetic organization keys.

Generate address options

Verify postcode	If enabled, verifies and corrects the format of the postcode.
Default street line	This property is used when the generating a phonetic address key, to indicate the position of the street and the town in the address.
Lines to scan	This property enables personal names to be extracted from address lines. It can be set to 1 or 2.
Premise first	Indicates whether to expect the premise or flat number to come first in address lines

Generate options

Drop excluded words	When enabled, flag any records that contain exclusion words in any of the key fields.
Consider casing	When enabled, consider the casing of the incoming data when splitting the data up for extracting keys, proper casing, and so forth.
Variable keys max length	Specifies the maximum length of various variable-length phonetic keys.

Compare tab

Phonetic compare options

Algorithm	The phonetic algorithm used when scoring. There are five choices available: soundIT, Loose_SoundIT, Dynamic_SoundIT, Soundex, None.
(Algorithm) for first name	Optionally specify a different algorithm to compare first names (‘none’ means use the same as the main Algorithm setting.
Loose threshold	When the Dynamic_SoundIT algorithm is in use, this property controls the threshold at which soundIT is switched to Loose soundIT.

Fuzzy compare options

Algorithm	The fuzzy algorithm used when scoring. There are two choices available: matchIT_Fuzzy, Damerau_Levenshtein.
Maximum edit distance	The maximum number of differences between the two strings. (Applicable to Damerau_Levenshtein only)
Minimum score	The minimum fuzzy score. (Applicable to Damerau_Levenshtein only)

Address compare options

Match box number and postcode	When enabled, two compared addresses score Sure if they contain matching postal box numbers and postcodes.
User premise range	When enabled, this will allow addresses to contain premise ranges.
Loose fuzzy premise match	When enabled, additional fuzzy premise matching is performed.
Match delivery points	When enabled, this will prevent two addresses from matching when both contain two postal codes but different delivery point codes and the addresses score below the minimum threshold.
Match DP threshold	See match delivery points, above.
Default DPSs	See match delivery points, above.
Ignore premise suffix	When enabled, this will allow two premises to match regardless of whether one or both has an apartment- or flat-type suffix (for example, 12 and 12a).

Name compare options

Prevent mrs matching miss	When enabled, then two compared names will not match if one has a title of Mrs and the other a title of Miss.
Fuzzy match non-normalized names	When enabled (the default), this will cause additional matching checks to be performed on names using the non-normalized name matching fields.
Blank name company matching	When two records contain no addressee names, this setting will allow the names to achieve a score depending on what's available in the job title and company name fields. · 0 - Off · 1 - On if either name blank · 2 - On when both names are blank
Initial match equivalent	Controls how an initial matches a name that's equivalent to the given firstname. For example, when comparing Rebecca Smith and B Smith, then the B could be considered a match for Becky, which is a common abbreviation (or equivalent) of Rebecca.
Cross match initial to name	When enabled (the default), and the first letter of a firstname matches the middle initial (for example, "Richard Smith" and "John R Smith") then the names will be considered a possible match.
Fuzzy match initials	Controls how similar-sounding initials (M/N, S/F, and G/J) can be matched. When set to 'full' (the default), then one name's initial is permitted to match the first letter of the other name's firstname (for example, "M Smith" versus "Neil Smith"). When set to 'initialsOnly', then only initials are permitted ("M Smith" versus "N Smith"). A setting of 'noMatch' disables such matches.
Initial match forename	Controls the result achieved when an initial matches the first letter of a firstname. This defaults to 'equal', so that B Smith versus Bob Smith will achieve the same result as Bob Smith versus Bob Smith (i.e. 'equal' for the firstnames). Reducing this setting to 'approx' or 'contains' will reduce the resultant name score in order to distinguish such matches.
Fuzzy match forename	Used to prevent different recognized firstnames from fuzzy matching. For example, ordinarily Ron and Roy will fuzzy match.

Resources tab

Threads	By default - or if 0 is specified for the number of threads - each engine will use all available processor cores.
Debug log	Should a process that uses matchIT Hub unexpectedly crash, the engine can be configured to create a debug log of all data loaded and all operations performed on the data.

Memory usage

Input buffer	All data added is initially stored in the input buffer. The processing threads remove this data from the input buffer for processing.
Output buffer	Similarly, all results are written to the output buffer. The application must remove results from the buffer to prevent it from becoming full.
Block size	Every item of data - once it's been removed from the input buffer and acquired by a processing thread - is stored internally in blocks along with other items of data.
Cache limit	When a block is full, it's added to the fast cache. When the cache becomes full (if the cacheLimit is not 0) then archiving will begin.
Threshold	When the memory usage of the running process exceeds the threshold, blocks will be moved from memory to the temporary disk paging file.
Compression level	The compression level can be 0 for disabled, or 1 (fastest compression) to 9 (slowest/best compression).
Encryption	The encryption key size can be 128, 192, or 256. Encryption of memory-resident data should not normally be required, but can be enabled if necessary.

Disk usage

Location	Specifies the directory in which a temporary disk paging file will be created, should the process's memory usage exceed the threshold (refer to Memory Settings, above).
Limit	A nonzero limit can be specified, in which case the process will be aborted should the disk file's size exceed the limit.
Compression level	The compression level can be 0 for disabled, or 1 (fastest compression) to 9 (slowest/best compression).
Encryption	The encryption key size can be 128, 192, or 256. A key size of 256, for maximum security, is highly recommended.