Challenges
- Data missing for 50% projects
- Data mismatches
- One-third of the real estate project records could not be used for modeling due to missing price or inventory data
- Project amenities specified using free text; same amenity could have multiple descriptions
Solution
- Software developed on R Shiny platform
- An NLP technique, Word2Vec was used for specification and amenity data
- Clusters of micro markets were formed by hierarchical clustering method
- Algorithm for forecasting velocity of sale was developed
- Link with the database for comparison of any new project
