DataRobot is a Boston-based tech company with offices in New York, London, Kyiv, Singapore, Tokyo, and Sydney.
13 апреля 2021

Cloud Reliability Engineer (Cloud Operations) (вакансия неактивна)

Киев, Харьков, Львов, Одесса

Необходимые навыки

5+ Years experience with AWS
3+ Years experience with Terraform or CloudFormation
5+ Years experience with Linux (Ubuntu, RedHat, or similar)
MongoDB, Mongo MMS, node.js/IIS on AWS/GCP/Azure
Demonstrable experience in one or more languages: Python, Perl, PHP is a plus
Strong knowledge of TCP/IP networking, SMTP, HTTP, load-balancers, highly available network servers
GitHub/Artifactory/RabbitMQ, Application Performance Monitoring principles, CDN, DNS
Knowledge of IP networking, network analysis, performance, and application issues using tools like fiddler and Wireshark
Experience maintaining large scale infrastructure, 100+ servers minimum


Must be familiar with AWS, GCP, and Azure architecture patterns and capabilities
Well versed in Software Defined Network definitions, capabilities, and limitations
Handle high-pressure situations in a calm and professional manner
Lead resolution effort of complex service problems from the network layer to the application at scale
Motivate, encourage, and provide technical leadership to team members
Work hand-in-hand with software developers to facilitate the adoption of “Paved Road” solutions
Build and support large-scale services across multiple platforms (Azure, AWS, and GCP)
Diagnose and repair issues by editing code in node.js, modifying MongoDB, Postgres, Redis, and configuration changes in cloud service providers
Create, edit, and maintain ad hoc scripts to resolve issues quickly with minimal user impact
Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention
Support periodic on-call duty