Ahrefs is looking for a oncall/devops to help take care of its distributed backend systems
powered by 2500+ servers and ensure all systems are up and running 24/7. We require deep understanding of operating systems and networks fundamentals, practical knowledge of Linux and a healthy desire to automate everything while being able to quickly resolve urgent issues manually.
Who we are
Ahrefs runs an internet-scale bot that crawls the whole Web, storing huge volumes of information to be indexed and structured in a timely fashion. Backend system is powered by a custom petabyte-scale distributed key-value storage to accommodate all that data coming in at high speed. On top of that Ahrefs is building various analytical services for end-users.
We are a small team and strongly believe in better technology leading to better solutions for real-world problems. We worship functional languages and static typing, extensively employ code generation and meta-programming, value code clarity and predictability, and are constantly seeking to automate repetitive tasks and eliminate boilerplate, guided by DRY and following KISS. If there is any new technology that will make our life easier — no doubt, we’ll give it a try. We rely heavily on opensource code (as the only viable way to build maintainable system) and contribute back, see e.g. github.com/ahrefs. Occasionally we track down CPU bugs tech.ahrefs.com/...ective-story-ab1ad2beddcd .
Our motto is “first do it, then do it right, then do it better”.
— first-aid reaction to infrastructure failures
— monitoring live production systems health
— dealing with hardware problems and interacting with datacenter
— developing internal automation — monitoring, setup, statistics
— help developers with deployment and integration
— participate in on-call rotation
You will be dealing on a daily basis with :
— 20PB storage cluster
— 2500+ linux servers
— experimental large-scale deployments
— all kinds of software bugs and hardware deviations
Our system is big part custom OCaml code and also employs the following third-party technologies:
The ideal candidate is expected to:
— Independently deal with and investigate infrastructure issues on live production systems
— Foresee problems and prevent them from happening
— Make argumented technical choice and take responsibility for it
— Understand the whole technology stack at all levels : from network and userspace code to OS internals and hardware
— Approach problems with practical mindset and suppress perfectionism when time is a priority
— Automate everything and then some
— Have healthy detestation for complex shell scripts
We provide :
— Competitive salary
— Cutting-edge technologies
— Informal and thriving atmosphere
— International team