We have a situation where we're unable to correctly extract data from an old SAP system for all languages. I don't fully understand the exact reason, but it’s related to the database collation/code page, etc. Currently, the client has this problem and were told that the only way to reliably extract data, such as Chinese, is by logging into SAP with the correct language and extracting data manually. Note - we're unable to use RFC data extraction due to other client issues.
I want to create profiling reports that attempt to look for 'corrupted' data. I would suggest that anything that is a non-ascii character is worthy of a 'heads up.’ These reports are intended to be another tool alongside other methods to identify problematic data. We have a profiling process to put all values into a table and then use a query with the following where clause:
- WHERE FIELDVALUE LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?@\[\^_`{|}~\]\\]%' ESCAPE '\'
This seems to return a decent result; however, when we collate the values with Latin1_General_BIN, I get other results that includes umlauts, Spanish/Scandinavian values that are extracted correctly WHERE FIELDVALUE COLLATE Latin1_General_BIN LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?@\[\^_`{|}~\]\\]%' ESCAPE '\'
Advice would be appreciated from anyone that is familiar with this type of problem. My goal is to better understand what these where clauses are doing or whether there is a better way to approach this. Thanks
Comments
3 comments