Proficient understanding of distributed computing principles
Good understanding of traditional and modern Data Warehouse concept (Data Lake, DWH, Data Mart, Operational Data Storage)
Experience with Apache Spark
Ability to solve any ongoing issues with operating the Big Data clusters
Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
Experience with integration of data from multiple data sources
Knowledge of various workflow management tools such as Apache Airflow
Good understanding of Lambda Architecture, along with its advantages and drawbacks
Have good presentation and communication skills, ability to work in multiple cultures and mixed teams
B1 and upper written and spoken English.
Experience with Cloudera/MapR/Hortonworks
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
Experience with various messaging systems, such as Kafka or RabbitMQ
Competitive salary depending on experience and skills
Flexible working hours
Partial remote work is acceptable
Office in the heart of Kharkiv downtown
Team building events
20 days of vacation, follow UA holidays
Company compensates English courses, books, participation in conferences local and abroad
Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
Implementing ETL process
Monitoring performance and advising any necessary infrastructure changes
Defining data retention policies
Founded in 2014, GreenM is the engineering and professional services provider for Data-centric solutions. We start from understanding goals of our partners and then leverage a wide variety of top-notch technologies to co-build Analytics products. Our senior-level engineering team brings to the table diverse backgrounds and expertise in numerous technology areas. This enables us to create easy-to-use Analytics platforms that are insightful, scalable and responsive.
At GreenM we deeply focus to unleash the full potential of each engineer and invest broadly to strengthen the Team skillset and build the company culture around continuous improvements.
We are looking for a Big Data Engineer to join our team to develop and grow the product that corresponds to our major company profile — gathering, processing and analysis of the Big Data.
The primary focus will be on choosing optimal solutions to use on the collecting, storing, processing, and analyzing of huge sets of data then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.
Solution is focused to help hospitals:
• Identify root causes and prioritize efforts to improve patient experiences
• Act upon emerging trends before they escalate to larger problems
• Perform real-time service recovery for at-risk patients
Data storages: Vertica, MS SQL Server, Dynamo DB, S3, Elasticsearch
ETLs: Amazon services (Lambda, EMR, Athena, etc), Apache Spark, Apache Airflow, Pentaho DI
Web: .Net, Angular, Ember
Visualization: Tableau, Kibana, Splunk