Challenges
- The database is mostly on addresses and entities (company names & person names)
- Different models are being built for address and entity separately
- Entity matching can be tricky as it contains both person name and company name which ideally should require different set of features
- Handling and Searching in a large amount of data
- Annotation of data consistently by multiple SMEs was a challenge
Solution
- Solr is used to search an address/entity from the database and to determine top 10,20 or 30 results
- The top results are then passed to ML to determine the best match among them
- MariaDB (SQL) database is used
- The deployment framework is Flask API + Gunicorn
