SITE RELIABILITY ENGINEER
Oracle Cloud Operations team is seeking a motivated Site Reliability Engineer that thrives in a fast-paced rapidly evolving technology environment.
This individual will be a member of the Oracle Field Service (OFS) SRE team.
DUTIES AND RESPONSIBILITIES
We are looking for the specialist who is ready to be the key performer in assuring of the OFS availability and performance 24×7x365. In practice it means the following:
- The keeping your eyes on reliability for our services using existing and own enhanced monitoring and telemetry tools
- Continuous learning in OFS topologies and their internal dependencies and behavior
- Effective prevention of services degradation, timely detection, troubleshooting, and resolution of service issues basing on proven procedures, shared team knowledge and experience
- Automation of routine and understood maintenance tasks and operations
- Improving monitoring, telemetry, practical maintenance procedures
- Product enhancement propositions in partnership with responsible development teams in order to warrant services and data available and secure.
- Direct involvement in the event/incident root cause analysis.
QUALIFICATIONS
Required:
- 3+ years’ experience in IT-services, network, storage, database, application troubleshooting and maintenance
- Bachelor’s or Master’s degree in Computer Science
- Experience working with fault-tolerant, highly available, high throughput, high loaded, distributed, scalable systems
- Unix/Linux OS background
- Understanding of web technologies (web, http protocol, etc)
- Experience in script writing (PHP, Perl, Shell, Python)
- English Intermediate
Desired (will be a plus):
- Basic experience and clear understanding of Kubernetes, Docker technologies
- Be acquainted with RESTful and SOAP APIs
- Understanding of in-memory cache and ESB technologies (Redis, Kafka, etc.)
- Knowledge and experience working with CI/CD technologies (Git, Jenkins, etc.)
- Clear understanding principles and basic practical experience in infrastructure as a code approach (Chef, Ansible, etc.);
- Familiarity and use in practice Prometheus, Kibana, Grafana monitoring tools
WORK CONDITIONS:
Working hours and place:
- Standard local business hours
- On-Call duties for emergency cases in accordance to team schedule
- Overtime compensation
- Remote (equipment will be provided) work and new modern office upon quarantine
Employment:
- Official employment in accordance to local legislation