Project Tech Stack: AWS, Python, Scala, Apache Spark, Spark Streaming, Kafka, Elastic MapReduce, Redshift, Spectrum, Athena, MySQL.
About the role:
We are looking for a highly skilled Staff Data Engineer to join our innovative team. The ideal candidate will have extensive experience working on large-scale production systems and will be responsible for leading the design, development, and optimization of our client’s data platform and infrastructure.
About the project:
Our client is an online database and search engine that allows users to search and share short looping videos with no sound, resembling animated GIF files. It is the world’s best and most comprehensive place to search, discover, share, and create animated graphics. The platform supports multiple API integrations with most messengers like iMessage, Facebook, Instagram, Snapchat, Twitter, Tinder, Slack, WhatsApp, and many more, enabling powerful expression across global communication platform.
Qualifications and Skills:
- 5+ years of professional experience in a data engineering role, demonstrating a strong track record of delivering high-quality data solutions;
- 3+ years of professional experience with GitHub and version control best practices;
- Strong proficiency in both Scala and Python, with a proven ability to develop and maintain scalable data solutions using these languages;
- Deep understanding of Apache Spark and Elastic MapReduce (EMR), including experience in optimizing Spark jobs for performance, reliability, and scalability;
- Experience with Spark Streaming and Kafka;
- Experience with data orchestration systems, particularly Luigi, to manage complex data workflows;
- Strong knowledge of Redshift, Spectrum, Athena, MySQL, and AWS ecosystems;
- Strong knowledge of general best practices in data modeling, storage, and retrieval (i.e. columnar / compressed storage, data retention, materialized views, etc);
- Experience with distributed systems at a scale beyond simple ETLs;
- Familiarity with automated data synchronization (in particular AWS DMS) from multiple sources into a data warehouse;
- Experience with CI/CD tools like Jenkins or Spinnaker;
- Experience with Docker and Kubernetes;
- Familiarity with Databricks, in particular using it to look into data discrepancy issues and identifying the source of the issues.
Nice to have:
- Familiarity with Google Analytics/GBQ.
- Familiarity with Tableau.
Responsibilities:
- Build, optimize, and maintain scalable data pipelines using technologies like Spark and Python.
- Manage and optimize data warehouses, data lakes, and cloud-based infrastructure (AWS).
- Ensure data integrity, consistency, and quality throughout the data lifecycle.
- Design efficient data models and implement strategies for optimal storage and retrieval.
- Develop and manage complex data workflows using orchestration tools like Luigi.
- Identify and resolve data issues, optimize pipeline performance, and contribute to data engineering best practices.