Challenges
The database is mostly on addresses and entities (company names & person names)
Different models are being built for address and entity separately.
Entity matching can be tricky as it contains both person name and company name which ideally should require different set of features.
Handling and Searching in a large amount of data.
Annotation of data consistently by multiple SMEs was a challenge.
Results
Test recall (KPI) for different models are given as below:
- Address Matching: 87.47 %
Entity Matching: 88.62 %
