Challenges
- Very sparse data: 70,000 loads (over 3 years), spread over 70 lanes and 42 equipment to answer a problem for any lane across the continent
- Apparently erratic prices: e.g. OH-TX lanes (750-1200 mi) are priced same as IN-PA (500-600 mi), for same equipment!
- Prohibitive price of historic data from Truckstop or DAT meant only generic third-party data like Fuel prices & CASS index
Solution
- PDFs need to be OCR’d to load texts in it
- Ensemble modelling – A very large number of models were developed
- Solution deployed using MLFlow, integrated with DataLake and transaction system
