Location Extractor. Crawl millions of business websites to find their Office Locations
Our client offers market research to its customers. Helping customers identify the demand & supply of certain businesses in a given region. With their database offering location data of 1 Billion+ businesses, they wanted to expand the records in their database
We built an Internet crawler. Much like a typical Search Engine crawler – it would go to millions of websites every day and employ AI and NLP to identify the office locations where their business operates from. These office locations would be extracted from the special web pages of each website. For pages like Contact Us, About Us, Office Locations, etc
To speed up crawler & save on crawling costs, we also implemented Focused Crawling. This enabled our crawler to prevent going from the entire website content, but rather use AI to identify just the key pages where location data was most likely to be found. By identifying which web pages are for Contact Us, About Us, and Office Locations, we will navigate directly to them. To access only the information we are interested in.
Finally, we used a combination of Natural Language processes and Computer Vision to identify the very regions of the pages where Location Details were mentioned.
Our client was able to expand their database of 1 Billion business locations to 1.5 Billion business locations in a matter of 6 months. This 50% increase was attributed to the Location Extract Internet Crawler we built for them.