Project Description:
Our client is a Fortune 500 market leader in insurance and financial services. We are assembling a global SRE team to provide 24/7, round-the-clock system reliability coverage for the client’s application and data infrastructure hosted on Microsoft Azure.
Experience 8+ years
Key Skills: Couchbase, Azure, Kubernetes, Terraform, Ansible, GitHub, Bitbucket and Python
Responsibilities:
• Provide Level 2.5/3 support to monitor applications.
• Troubleshoot production incidents in real-time and proactively identify system anomalies.
• Lead root cause investigations, work with Application/Microsoft, and put closure to the observed issues in applications.
• Monitor database and take care of disk persistence, data replication, live cluster reconfiguration, rebalancing, and multitenancy with data partitioning.
• Work with senior engineering and testing team members to build tools and testing strategies for problem prevention, detection, and chaos testing.
• Manage the applications deployed in Kubernetes environments and take care of scalability, replication, modularity, and other benefits offered.
• Monitor the application performance and make sure that needed security standards are implemented and there is no breach of set standards.
• Design, develop, ship, and motivate the creation of software and systems to increase product reliability and organizational efficiency.
• Guide reliability practices through the entire software development lifecycle through activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning.
• Improve service reliability through blameless post-incident reviews and using code to prevent or respond to problem recurrence.
• Recognize automation opportunities.
• Work on a rotation basis and be able to support on weekend during assigned schedule.
• Plug into the software release cycle. Work closely with developers to ensure software releases are well designed, planned, implemented, released, and monitored.
Must have:
• Hands-on Experience with Couchbase (Nosql Database)
• Couchbase Installation and Cluster Setup in a production environment.
• Couchbase Administration.
• Couchbase Database Backup and Recovery.
• Couchbase Index creation, Management, Tuning.
• Couchbase Architecture and concepts.
• Security in Couchbase/Security patch applying.
• Troubleshoot a Couchbase cluster for common configuration issues.
• Managing Cross Datacenter Replication (XDCR).
• Performance Monitoring.
• Couchbase architecture design ownership suggesting best practices.
Tools:
• Good to have hands-on experience on Terraform, Ansible, Azure, Github, Bitbucket and Python.
We Offer:
• Possibility to influence the development of the project.
• Friendly professional staff and warm atmosphere.
• Help with development via mentoring and coaching.
• The environment where you can implement your ideas.
• Plans for growth and the performance review (every 6 months).
• Flexible schedule and opportunities to working remotely (8 hours workday).
• Paid vacation and sick leaves.
• Medical insurance, gym.
• Participation in educational activities and thematic conferences.
• Team parties and corporate events.