We are a US-based startup working in the biotech domain. Our product, a web-based data analytics platform, has several unique capabilities, such as the ability to interactively analyze millions of data points in the browser, use AI to learn from data and user actions, or develop and deploy new functionality as plugins.
Our tech stack:
• Orchestration tools: Docker, Kubernetes, AWS ECS
• IaC: Jenkins (Groovy), GitHub Actions, Ansible, Terraform, CloudFormation
• Languages: Dart, JS, Java, Python, Bash
Responsibilities:
• Build services to ingest data into our database and ensure it’s clean, consistent, and highly performant;
• Automate, improve and evolve the deployment process, continuous integration, and testing;
• Building, designing, and optimizing data pipelines and infrastructure;
• Combine raw information from different sources;
• Investigate and troubleshoot issues;
• Improve infrastructure as code;
• Collaborate with developers;
• Optional. Develop cost-effective scalable ML systems and innovative algorithm solutions
You will also deploy and integrate Datagrok with customer systems in AWS-heavy environments and ensure they function well under load.
While we’re considering candidates with various skill sets and for different levels, this role will be ideal for you if you have a degree in a quantitative field (e.g., mathematics or statistics), strong programming skills (Python, SQL), excellent knowledge of data structure, algorithms, and design patterns. You should be up to date with the diverse technologies in the big data ecosystem and should be able to objectively assess the different approaches to solve the problem at hand.
Requirements:
— 4+ years of experience as a Data Engineer and/or DevOps engineer (rare exceptions for highly skilled developers);
— Experience working with large data sets, simulation/optimization, and distributed computing tools (e.g., Spark, Airflow, Dash, etc.);
— Strong working knowledge of containerized production (e.g., Go/Flask-Server running within Docker, Kubernetes), DevOps and CI/CD principles;
— Hands-on experience with:
◦ managing Linux-based systems;
◦ tools such as Terraform, Ansible;
◦ writing code in one or more languages, such as Python, Bash, Groovy;
◦ with AWS stack and building Infrastructure as Code (Terraform, AWS CloudFormation);
◦ creating continuous integration and continuous delivery CI/CD with Jenkins CI, GitHub Actions, or other CI servers/Build Servers;
◦ Git, Docker;
— Experience with data warehouse technologies: MapReduce, HDFS, Hive, Tez, Spark, Sqoop
— Fluent in English to communicate effectively with native English speakers
Nice to have:
— Hands-on experience with any other provider than AWS;
— Experience with workflow tools: Airflow, Composer;
— Experience developing and deploying machine learning algorithms;
— Extensive knowledge of ML frameworks and libraries;
Position details:
— Flexible working hours;
— Unlimited vacations / PTO;
— Excellent growth opportunities in a highly dynamic, unique product company;
— Zero bureaucracy;
— Paid trainings and certifications;
— Competitive compensation.