The Exact Data Match Indexer is part of our Exact Data Match feature in Umbrella DLP. The tool indexes a customer data source (CSV file) and generates fingerprints of critical records which are uploaded to Umbrella for use in DLP policies.
When a large data source (CSV file) is indexed the error below may be encountered. This article explains how to increase the available memory for the indexer to work with large data sources.
ERROR: Out of heap space; please rerun with an increased size (-Xmx).
Run the indexing tool with -Xmx specifying the amount of memory to allocate to the indexing tool. The memory allocation can be specified in mebibytes (m) or gibibytes (g). For example:
- -Xmx1000m = 1000 mebibyte (1024 megabytes)
- -Xmx1g = 1 gibibyte (1074 megabytes)
The required memory will depend on the file size of the source file (CSV file). We recommend to allocate memory at least twice the size of the source CSV file. For example, if the source data is 512MB the memory could be allocated as follows as follows:
java -X1g -jar edm-indexer.jar -i source_file.csv -e template-id
If the tool is being run in an automated way then the memory allocation should be increased to account for changes in the source data size.