1. MUST HAVE — Minimum of 5+ years of system administration experience for a high-usage, web-based software service ideally built using open-source software components
2. MUST HAVE — Knowledge of Amazon AWS services and API’s including EC2, S3, VPC, IAM
3. MUST HAVE — Knowledge and familiarity with alerts & monitoring tools, and system management tools for Linux environments (including DataDog, Nginx, NewRelic, CloudFlare, MySQL/PostgreSQL, Apache, IPTables, ELK stack
4. NICE TO HAVE — Knowledge and familiarity with configuration management tools including Ansible, Chef or Puppet
5. NICE TO HAVE — Knowledge of deploying / troubleshooting / tuning Ruby on Rails applications (Passenger, Capistrano, Sidekiq, Bundler)
6. NICE TO HAVE — Knowledge of type-1 hypervisor virtualization (Xen, VSphere)
1. MUST HAVE — Strong communication skills with an ability to coordinate the incident response with urgency
2. MUST HAVE — Proper remote presence & etiquette (acknowledging requests in a timely fashion over Slack, not leaving requests unacknowledged at all)
3. MUST HAVE — Tagging the appropriate person and persistently reminding them every 24 hours until a full resolution is achieved (not having things fall through the cracks)
4. MUST HAVE — Effective adherence to operating procedures (organizing day-to-day work and large-scale tasks in a calm manner with priority-driven sequencing)
— Competitive salary.
— Career and professional growth.
— Cozy fully-equipped office in Ivano-Frankivsk.
— Great work-life balance with flexible working hours and free office lunches, remote-friendly.
— Paid vacation and stipend for Language classes, gym, IT events, etc.
— Dedicated AWS account (or bare metal servers, per your choice) for infrastructure automation testing, development and general learning
— Retina MacBook Pro or another laptop of your specification, peripherals and displays included
— Books, library & conference budget
— Reliably automate the server provisioning process to reduce the labour of our R&D team
— Building scalable infrastructure to manage high-load, concurrent sessions to support ~50 mm monthly page views and 500k+ active users
— Drive the company through “Disaster Recovery Tests”, where we manually turn down pieces of infrastructure to test products overall resiliency to failures
— Implement the systems and processes that Product Developers use to deploy their software into production
— Build an auto-remediation system to automatically resolve production incidents before escalating them to on-call Developers
— Because of the nature of SRE work you should also be prepared for on-call shifts and potential “all-hands-on-deck” situations at any hour of the day or night. Minimizing those situations is part of your job!
ManageBac is a world leader in international education SaaS systems and services, serving over 600,000+ students across 2,300 schools in over 120 countries through our flagship service ManageBac, a curriculum first learning platform developed by our core product team in IF, Ukraine.
We develop on Ruby on Rails with excellent tools, full technical control, and with clear product demand. As one of the leading product teams in western Ukraine, we invite Senior Engineers to apply for our open R&D positions.
Site Reliability Engineers are hybrid systems and software engineers who are responsible for and take ownership of reliability, automation, and other issues related to “keeping the lights on” across ManageBac’s multi-product SaaS systems stack.
SREs are integrated within the Technical Operations team and work under the Head of Technical Operations and with the CTO and Principal Developers. We are looking for engineers who want to be a part of developing infrastructure software, maintaining it and scaling it.