Gemicle — an innovative, highly technological company with a broad range of expertise in spheres of the development of apps, complex e-commerce projects, and B2B solutions. Qualified teams of developers, designers, engineers, QAs, and animators deliver excellent products and solutions to branded and well-known companies.
1 серпня 2022

Lead Big Data Engineer (вакансія неактивна)

Київ, Вінниця, віддалено

We deal with Cyber Data Fusion and AI products for civilian protection.
Our mission is to empower our customers to fight crime and terror through state-of-the-art technologies that provide accurate and precise intelligence.

As a Big Data Engineering Team Lead, you will be instrumental in accelerating and scaling our data pipelines and data lakes. Your primary focus will be on researching optimal solutions appropriate for the aforementioned purposes and then implementing, maintaining, and monitoring them.

What You’ll Do:

  • Establish, lead, manage and mentor the big data team
  • Own the development of an inhouse Data Lake for storing structured and unstructured data
  • Research, design, and develop appropriate algorithms for Big Data collection, processing, and analysis
  • Define how data will be streamed, stored, consumed, integrated by different data systems
  • Identify relevant Big Data tools required to support new and existing product capabilities
  • Collaborate closely with the product team to define the requirements and milestones that relate to Big Data features
  • Closely interact with the Data Scientists in providing feature-ed datasets
  • Design, create, deploy, manage data pipelines within the organization
  • Create data architecture documents, standards, and principles and maintain knowledge on the data models
  • Collaborate and coordinate with multiple teams/departments to identify the data domains and data gaps between current state systems and future goals
  • Communicate clearly and effectively the data entities and their relationship within a business model
  • Audit performance and advise any necessary infrastructure changes
  • Develop key metrics for tests on the data end create data quality rules
  • Focus on scalability, availability, and data governance


What Skills and Experience You’ll Bring:

  • Proficiency with Scala and Apache Spark
  • Proficiency with Hadoop ecosystem services such as MapReduce v2, HDFS, YARN, Hive, HBase
  • Experience with any data lake table formats (e.q. Apache Hudi, Apache Iceberg, Delta Lake)
  • Experience with building stream-processing systems using solutions such as Apache Kafka and Apache Spark streaming
  • Experience in orchestration tools (e.q. Apache Airflow)
  • Experience with integrating data from multiple heterogeneous sources and various formats (Parquet, CSV, XML, JSON, Avro)
  • Experience with SQL databases and NoSQL databases, such as Elasticsearch and MongoDB
  • Nice to have hands-on experience with Kubernetes
  • Strong communication and teamwork skills

We offer:

  • Competitive salary based on skills and experience
  • Yearly performance bonus
  • Working from home policy & Flexible hours
  • Udemy unlimited membership
  • Happy hours/events and many more team bonding activities
  • Engage with new technologies and innovative products
  • Internal hackathons and technical seminars