Cloud Migration of ETL Service
The client is a job search company that owns a large number of job boards. At a certain point, the company encountered a growth slowdown due to the complexity of administering its own infrastructure, as well as the inefficiency of the search engine used - ElasticSearch.
Importing data from partner systems was slow and unstable, and the search engine could not cope with the volume of data, while the consistency of search results suffered.
“The system should be able to synchronize tens of gigabytes of data every 10 minutes.”
Gaining Advantage by Adopting Cloud-Native
To address these problems, a plan was proposed to migrate the service to a cloud infrastructure based on GKE, as well as a transition to the SaaS search engine for jobs - Google Cloud Talent Solution. The search data upload service was optimized to work in a cloud environment and redesigned to improve the efficiency of data importing. As the number of users grew, so did the number of search queries, and the ElasticSearch cluster deployed at the client could no longer handle the load, and the increase in data affected the relevance of the search results. Switching to CTS, a cloud-based search service powered by machine learning on structured job data, solved these problems. The service scales automatically, which requires no administration costs, and the specially trained model ensured high search relevance.
“Data is imported from multiple sources. Each source has its own format and often its own transfer protocol.”
Search data was supplied by partners in a number of formats, and a modular architecture was implemented to keep them updated, making it possible to unify work with different sources and quickly add new ones. In the old system, data was downloaded to disk, but in the new cloud-native version, data is processed in batches in several steps, making the service easily scalable and faster to download. The application was rewritten from Python to Golang for efficient CPU usage and faster processing. This has speeded up job data updates by 10-12 times.
Google Kubernetes Engine was chosen as the cloud infrastructure, as it is easy to administrate, but gives wide opportunities for service and network management. This has significantly reduced the time required to maintain the architecture, and the operation of the service and the database within a single provider has reduced the cost of network resources.
The resulting solution enabled the client to aggressively grow the customer base because the new infrastructure and application architecture allowed for easy scalability with minimal operations costs.