Radix Analytics Pvt Ltd

Issues & Objectives

  • To develop ML based system for matching addresses and entity names:
    • Same address or entity names can be written in few different ways.
    • vIt may have spelling issues, order might be different, abbreviations may exist.
    • vOur aim is to build a ML based system that can accurately match the addresses and entity with the ones already in our database.

Solution

  • Two Stage Modelling: Solr + ML
  • Solr:
    • It is used to search an address/entity from the database and to determine top 10,20 or 30 results.
  • ML:
    • The top results are then passed to ML to determine the best match among them.
  • Database: MariaDB (SQL)

  • Deployment Framework: Flask API + Gunicorn

Project information

Skills

Machine Learning

Client

Corporate Data Aggregator

Domain

Text Analytics

Location

India

Challenges

  • The database is mostly on addresses and entities (company names & person names)

  • Different models are being built for address and entity separately.

  • Entity matching can be tricky as it contains both person name and company name which ideally should require different set of features.

  • Handling and Searching in a large amount of data.

  • Annotation of data consistently by multiple SMEs was a challenge.

Results

Test recall (KPI) for different models are given as below:

  • Address Matching: 87.47 %
  • Entity Matching: 88.62 %