Experience in analyzing and troubleshooting production systems
Experience with modern software development, preferably in Java
Deep Understanding of Linux and UNIX-based systems
Familiarity with Agile software development practices
Understanding of TDD principles
Solid knowledge of SQL and modern databases
Experience with CI/CD-systems
Experience with networking (TCP/UDP, ICMP, DNS, etc), OSI Layers, infrastructure services, and security
Experience with software monitoring and alerting systems
Good English communication and problem-solving skills
Familiarity with cloud technologies
Experience with Docker and Kubernetes
Experience with NoSQL databases
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practice of English: courses and communication with colleagues and clients from different countries;
— Support of speakers who make presentations at conferences and meetings of technology communities.
The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
Lack of dress code, friendly atmosphere, concern for the comfort of specialists;
Flexible schedule and the ability to work remotely;
The ability to work in any of our development centers.
Analyze and improve the availability, latency, performance, and efficiency of the applications
Proactive support of production applications (both in-office and out of hours) across a range of domains, these are mainly written in Java and use Oracle databases.
Improve the monitoring and alerting of the applications
Capacity planning and provisioning
Improve and standardize build pipelines, identify and reduce any areas of manual toil through automation.
Consult in areas of reliability and scalability for the development of new applications.
Work together with teams in other departments to find solutions
Conduct periodic on-call duties
Our client is one of the biggest online retailers worldwide with an annual revenue of £1 billion. Over the years we helped the client develop web-portals, mobile apps, delivery control systems, staff management tools, data storage and much more. The systems we’ve built together are in operation 24/7, contributing to the client’s success.
Site Reliability Engineering is a new role, first introduced by Google, that combines the skills of developers and ops to deliver more reliable, scalable software. The goal is to analyze a diverse set of applications (primarily built using Java, Oracle, AWS, Google Cloud services and a number of other technologies) and bind them into a reliable self-healing suite, working within defined reliability requirements. This requires proactive work to ensure observability, analyze potential bottlenecks and suggest their fixes before they become a production incident.
This position may be of interest to DevOps engineers who would like to get closer to the code or get valuable specialization with a focus on JVM stack. The position may also appeal to developers who are interested in how large scale systems operate and what happens to the code after it is live.