Title: Data Engineer (GCP, Scala, Athena)
Type: Contract (6 months, extendable)
Location: Remote (Ukraine)
The Data Engineer will work with the internal product and tooling team to work on projects including planning, designing and implementing new solutions using the latest Cloud and Big Data technologies. You will work remotely to implement and help develop cutting edge solutions, create data pipelines which will migrate data from on-prem systems and load it into a cloud hosted Enterprise Data Platform.
- Working with our internal product and tooling team to design and develop end to end Cloud based solutions with heavy focus on application and data, all with good understanding of underlying cloud infrastructure
- Working on complex and varied Cloud Data focused projects such as migrating business-critical applications to the cloud, re-platforming or re-architecting difficult data and analytics use cases; Migrate existing data warehouses from on-premise data center or from one cloud provider to another
- Delivering highly reliable software and data pipelines using Software Engineering best practices like automation, version control, continuous integration/continuous delivery, testing, security
- Define, implement and enforce automated data security and data governance best practices within the solutions designed
- Help our customers to grow from a data warehouse into a true cloud-native data platform with full multi-source data ingestion and integration.
- Build or move analytic workloads to the cloud for better scalability and efficiency.
- Translating complex functional and technical requirements into detailed designs.
- Writing high-performance, reliable and maintainable code.
- Automate, create reusable accelerators to help get the job done quicker and better
- Perform data processing requirements analysis and data flow integrations with external systems.
- Diagnostics and troubleshooting of operational issues. Perform Health-checks and configuration reviews.
- Proficiency in a programming language such as Python, Java, Go or Scala
- Experience with big data cloud technologies like EMR, Athena, Glue, Big Query, Dataproc, Dataflow.
- Ideally you will have specific strong hands-on experience working with Google Cloud Platform data technologies — Google BigQuery, Google DataFlow, and Executing PySpark and SparkSQL code at Dataproc
- Understand the fundamentals of Spark (PySpark or SparkSQL) including using the Dataframe Application Programming Interface as well as analysing and performance tuning Spark queries
- Have experience developing and supporting robust, automated and reliable data pipelines
- Develop frameworks and solutions that enable us to acquire, process, monitor and extract value from large dataset
- Have strong SQL skills
- Bring a good knowledge of popular database and data warehouse technologies & concepts from Google, Amazon or Microsoft (Cloud & Conventional RDBMS), such as BigQuery, Redshift, Microsoft Azure SQL Data Warehouse, Snowflake etc.
- Have strong knowledge of a Data Orchestration solutions like Airflow, Oozie, Luigi or Talend
- Have knowledge of how to design distributed systems and the trade-offs involved
- Experience with working with software engineering best practices for development, including source control systems, automated deployment pipelines like Jenkins and devops tools like Terraform
- Experience in data modeling, data design and persistence (e.g. warehousing, data marts, data lakes).