Patrianna is a super fast-growing product development company headquartered in Gibraltar with colleagues around the world. We are looking for exceptional, smart talents striving to be number one. Motivated and capable of scaling up business functions at pace through domain expertise and a desire to continuously improve.
26 червня 2025

Monitoring Engineer (24/7)

віддалено

Dive into the pulse of cutting-edge solutions with Patrianna LTD! 🚀

Are you ready to dive into the dynamic world of social gaming and be part of a rapidly expanding team? We’re on the lookout for a talented Monitoring Specialist (support) to join our Patrianna LTD team on a full-time basis.

🌟 What You Gain?

Dynamic Environment: Step into the heart of a super fast-growing social gaming company, where innovation and creativity thrive.
Global Impact: Be at the forefront of crafting a global social entertainment platform, with a primary focus on captivating the North American market.
Limitless Growth: Take your career to new heights with opportunities for advancement and personal development. Join us in the exhilarating journey of continuous growth.
Massive Reach: Contribute to the development of client web and mobile apps that engage with up to 150 million customers worldwide.
Commitment to Excellence: We’re dedicated to delivering high-quality code, ensuring predictable behavior in production, seamless scaling, and automation every step of the way.

We are looking for a skilled Monitoring Specialist to join our 24×7 SRE team. The ideal candidate will work non-business hours aligned with European time to ensure seamless operations and system reliability. This role focuses on monitoring and diagnostics across a multi-site production environment, primarily for Java-based applications on Google Cloud Platform (GCP). Leveraging modern monitoring tools, the SRE will proactively identify, analyze, and resolve issues, maintaining high service performance and reliability.

Key Responsibilities:

  • Monitor the health of infrastructure and applications using monitoring systems (Prometheus, Grafana, Zabbix, etc.)
  • Analyze metrics, logs, and alerts to detect issues early
  • Perform Root Cause Analysis (RCA) of incidents
  • Contribute to improvements in monitoring and alerting (optimizing metrics, reducing noisy alerts)
  • Document common issues and response scenarios
  • Collaborate with development and operations teams to resolve incidents

Required Skills & Qualifications:

  • 1–2 years of experience in a technical IT position (system administrator, DevOps, technical support, monitoring, etc.)
  • Understanding of how servers, networks, applications, and services work
  • Ability to read and interpret system and application metrics (CPU load, memory, disk usage, latency, RPS, etc.)
  • Experience working with logs (ideally familiarity with Elasticsearch, Kibana, Loki, Splunk, or similar)
  • Strong skills in data-based problem analysis and diagnostics
  • Knowledge of basic monitoring and alerting principles
  • Experience configuring monitoring and alerting tools (Prometheus, Alertmanager, Zabbix, Datadog, etc.)
  • Familiarity with visualization tools (Grafana, Kibana)
  • Understanding of CI/CD processes and SRE principles
  • Experience working with Linux and basic command-line tools
  • Participation in incident reviews (postmortems)
  • Willingness to work in a shift-based schedule

Schedule Requirements:

  • This role operates during non-business hours aligned with European time to provide continuous coverage and support for our production environments.
LinkedIn