● 3+ years of experience in software development.
● Strong knowledge of Java 8 or higher.
● Strong knowledge of SQL.
● Understanding of Big Data technologies.
● Knowledge in AWS services (EC2, Lambda, SQS, Kinesis, S3, CLI, CloudWatch).
● Basic experience working with Spark.
● Experience working with data, databases, data processing, data formats.
● Experience in performance optimization.
● Passion to pro-active learning of new technologies and sharing it in the team.
● Skills of self-management, self-control, effective work in distributed team and remote communication.
● English level: Upper-Intermediate or higher.
● Experience in building real-time ETL processes.
● Experience developing on Scala, NodeJS, Python.
● Experience is automation of different processes and operations.
● Experience working with column-based databases like Redshift.
● DevOps experience: Linux, Jenkins, Docker, Kubernetes.
● Experience in administration of AWS services: EC2, Lambda, EMR, Kinesis, Redshift.
● A competitive reward for your skills, experience, input, and results.
● Abilities to visit conferences, master classes, pass certifications.
● English classes and opportunity to learn from a native speaker.
● Full compensation package.
● Regular team events and activities.
● Building data catalog / metadata management platform from scratch.
● Building high scale data pipelines on AWS cloud services.
● Building automated system for configuration and maintenance of data pipelines.
● Building audit and monitoring tools for data pipelines.
● Creation algorithms for data processing, validation, reconciliation.
● Documenting own services/algorithms/functionality on wiki.
● Participating in making all technical decisions in the team.
● Research new big data technologies.
● Business trips to California, US for consulting and support of our partners (once in
You will be a member of a cross-functional team building a Data Lake based on Amazon AWS services for a Business Intelligence of big international company. The team is also building tools for automated configuration of the pipelines, monitoring, managing the metadata, automatic scaling, controlling the access, automatic data reconciliation, etc.