Basic theoretical knowledge in Big Data and related technologies: RDBMS, NoSQL, Consistency (ACID, BASE), OLAP and OLTP, massively parallel processing, data warehousing
Novice level in Scala
Intermediate level in data platforms (RDBMS). Understanding of relational model, basic databases concepts, and components
Experience with one relational database at least
Novice level in Big Data Platform (Apache Spark)
Experience with Hadoop Ecosystem
• Professional Development:
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, technology stacks, try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practice of English: courses and communication with colleagues and clients from different countries;
— Support speakers who make presentations at conferences and meetings of technology communities.
• The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
• Friendly atmosphere, concern for the comfort of specialists;
• Flexible schedule (there are core mandatory hours), the ability to work remotely upon agreement with colleagues;
• The ability to work in any of our development centers.
Our client is a US-based healthcare company that building a solution to process drugs reviews information. The Big Data Engineer will work on the ETL framework aimed at building configurable, extendable, easy to use ETL processes. The framework is a Spark application, which is configured from a set of metadata tables stored in Apache Cassandra to load data from different source servers, saving obtained files into HDFS, further obtaining data from them, transforming and saving into Cassandra target tables. The framework also allowed loading data to CRM. Orchestration of all these processes was implemented with a help of Apache Oozie.
The specialist will develop and support ETL platform based on the Apache Spark and Hadoop ecosystem.