We are ITernal Group, a reliable and reputable IT company that specializes in complex software solutions, the one company established by merging 3 companies into one single organization in 2019. Every one of our individual companies had a background in different industries and technologies. The oldest of companies was founded in 2004.
5 січня 2024

Lead Site Reliability Engineer (SRE) (вакансія неактивна)


Position Overview: We seek a highly skilled and experienced Site Reliability Engineering (SRE) lead to join our team. As an SRE lead, you will be responsible for designing and implementing robust, scalable, and highly available systems to ensure our client’s infrastructure’s reliability, performance, and security. You will collaborate with cross-functional teams to identify and address operational and architectural challenges, driving improvements and innovations in client systems and processes.


  • Enable clients to navigate and scale adoption of New IT methodologies and operating models that drive business agility using SRE, Agile frameworks, DevOps etc.
  • Define client’s Delivery, Operational, and Governance transformation strategy and roadmaps.
  • Gather and analyze metrics from operating systems and applications to assist in performance tuning and fault finding.
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Collaborate with team members to build tools and strategies for problem prevention, detection, and chaos testing.
  • Participate in system design consulting, platform management, and capacity planning.
  • Create sustainable systems and services through automation and uplift.
  • Balance feature development speed and reliability with well-defined service-level objectives
  • Design scalable, reliable, and secure SRE systems.
  • Develop automated infrastructure provisioning and configuration management systems.
  • Implement monitoring, logging, and alerting solutions for issue identification and resolution.
  • Create and maintain infrastructure, process, and procedure documentation.
  • Optimize performance, ensure fault tolerance, and conduct capacity planning.
  • Troubleshoot production incidents in real-time.
  • Perform root cause analysis and implement preventive measures for system reliability.
  • Collaborate with cross-functional teams to align SRE objectives with business goals.
  • Improve service reliability through blameless post-incident reviews and using code to prevent or respond to problem reoccurrence.
  • Provide technical guidance and mentorship to SRE teams.
  • Stay updated on industry trends and emerging technologies in SRE and infrastructure management.


  • Proven experience as an SRE engineer or a similar role.
  • Strong understanding of system design principles, distributed systems, and cloud infrastructure.
  • Proficient in programming and scripting languages (e.g. Python, JavaScript, PowerShell, Bash, etc.)
  • Proficient in Cloud technologies (e.g. AWS {preferred} / Azure / GCP)
  • Deploy and manage container orchestration, service mesh, serverless, API gateway & observability stack.
  • Experience with infrastructure automation tools (e.g., Terraform, Ansible) and containerization technologies (e.g., Docker, Kubernetes)
  • Deep knowledge of monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, Datadog, AppDynamics, New Relic, etc.)
  • Hands-on experience with alerting tools (e.g. Pager Duty, Zen Duty, etc.)
  • Experience with SCM tools (e.g. Git, Bitbucket, etc.)
  • Experience with orchestration tools (e.g. Jenkins, Openshift, CircleCI, Harness, etc.)
  • Solid understanding of Application Servers, Web Servers, firewalls, networks, and databases.
  • Familiarity with security best practices and experience in designing secure systems.
  • Excellent understanding of Scalability processes and techniques
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and interpersonal skills to collaborate effectively with diverse teams.

We offer:

  • Possibility to influence the development of the project
  • Friendly professional staff and warm atmosphere
  • Help with development via mentoring and coaching
  • The environment where you can implement your ideas
  • Plans for growth and the performance review
  • Flexible schedule and opportunities to work remotely (8 hours workday)
  • Paid vacation and sick leaves
  • Participation in educational activities and thematic conferences
  • Medical insurance
  • Team parties and corporate events