Сучасна диджитал-освіта для дітей — безоплатне заняття в GoITeens ×
24 квітня 2019

Site Reliability Engineer (вакансія неактивна)

Необхідні навички

— 7+ years of experience of increasing responsibility in a technical support and data center operations roles, including team and process management responsibilities. Experience with Cloud data centers is a must
— Practical programming experience with some of the widespread programming languages (Python, Golang or similar) is a must

Deep technical roots in data center technologies:
— Large-scale Linux production environments, preferably as part of a Cloud service provider environment
— Understand datacenter networking fabric topologies and common architectures deployed
— Virtualization technologies, in particular VMware product suite (vCenter, ESXi) is required
— Deep understanding with cluster management systems like Kubernetes and Docker based container deployments is required
— Experience with CM tools like Chef/Puppet/Ansible

— Experience with CI/CD pipelines
— Knowledge of Web Services (REST API) and/or SDK integrations
— Knowledge of core infrastructure components like LDAP, DNS, DHCP, etc.
— Basic knowledge of security tools and best practices
— Prior successful experience of working in an innovative, fast-paced startup with a high rate of flux. The candidate must demonstrate strong entrepreneurial spirit and vigor
— Demonstrated proficiency in creating detailed technical design documents, facilitate design reviews, and execution of design implementation projects
— Strong English written and verbal communication skills to work with the CloudSimple global team
— BS/MS degree in Computer Science or equivalent experience

Буде плюсом

— Practical experience of work with Logstash, ElasticSearch, Cassandra and Kafka is a plus

Пропонуємо

— International environment with great people to work with
— Unique project with modern technologies
— Opportunities to make a difference and grow professionally
— Competitive compensation
— Long term employment with paid vacation
— Sports and healthcare package (medical insurance, paid gym membership)

Обов’язки

As a member of SRE team, the candidate will face with the challenges arising from building and evolving large SaaS/PaaS system including but not limited to:

— Building and evolving of monitoring stack spanning multiple heterogenous components and regions
— Setting up and adding missing puzzles of CI/CD pipelines
— Feeding back results of post-mortem to monitoring tooling

Key responsibilities of Site Reliability Engineer include:

— You will be responsible for the systems deployment, operations, and monitoring for our infrastructure, including design and development of infrastructure automation
— You will get your hands dirty, troubleshooting infrastructure, and architectural challenges using your existing knowledge and toolkits
— You will drive reliability and supportability aspects of Cloud service by creating knowledgebase and, working with DevOps, coordinate change management policies, deploy ticket/incident management system, service request queue triaging and auto-remediation
— You will utilize your advanced system architecture & administration skills for collaboration with engineering and product management, test and automation teams to architect and develop strategic and tactical solutions
— You will help develop requirements for customer onboarding processes, target environment sizing and migration automations

Про проєкт

CloudSimple is the international product company leveraged by top-notch leading investors (RedPoint, Mayfield, and Microsoft Ventures) with the HQ in San Francisco Bay Area. We are building a large scale distributed platform that manages private software-defined data centers fully integrated with world-leading public clouds.

Гарячі вакансії

Всі вакансії