Brainstack_ is a team of intelligent, fun-loving people working on their own products that truly have value. Some of our products have already made their name In US, Europe, Central, and South America, and we keep catching up with new products conquering new territories.
— Maintain highly available Kubernetes cluster on bare metal
— Ensure high uptime (99.995%+) and response time for Kubenetes cluster and production environment
— Implement configuration management for new and existed services by using industry best practices and tools
— Provide release and application support for developers and QA
— Ensure accessibility, integration, performance and security for all tools used in the product life cycle
— Provide day-to-day operational support of mission-critical systems and services
— Root Cause Analysing for all production outages
— Improve monitoring and alerting systems
— Proven experience as a DevOps, Linux System Administrator or SRE with production environment linux based.
— Strong network knowledge, complete understanding of TCP/IP, UDP, BGP. Ability to troubleshoot, profile and resolve network issues.
— Experience of installation and manage Kubernetess high available cluster in production environment in multiple colocations environment with physical assets (bare metal).
— Knowledge of Kubernetes core components: kube-apiserver, kube-scheduler, kube-controller-manager, kube-proxy, etc.
— Experience with containering and virtualization using Docker, LXC, KVM/QEMU;
— Experience with databases: MySQL\Percona, PostgreSQL, MongoDB, including performance tuning, replication and high availability practises.
— Deep knowledge in Nginx, PHP-FPM, RabbitMQ, Redis, ElasticSearch/Logstash/Kibana.
— Strong experience of administration linux distributions (Debian/Centos/Ubuntu/etc), hardering practices, performance tuning practices.
— Deep knowledge in Unix-like systems including kernel subsystems, system calls, inter-process communication, process / resource management, file systems, networking, protocols, sockets etc
— Knowledge in security field: firewalls, IDS/IPS, SELinux.
— Experience with network file systems like glusterfs, nfs, ceph.
— Knowledge local tools for monitoring and troubleshooting: tcpdump, ss/netstat, mtr, vmstat, iostat, top, sar, free, pmap, ps, lsof, strace, iproute2 utilities.
— Experience with monitoring and logs collecting systems: Zabbix, Graphite, Grafana, Sentry, ELK stack, Prometheus
— Ability to write complex scripts (shell/python/perl)
— Experience with IT Automation Software with any tools like ansible, puppet or chef
— Sufficient English to understand the documentation
— Experience with incident management and on-call rotation
— Familiar with continuous deployment strategies (blue/green, canary)
— Possibility to be part of the creative environment and make your input to it;
— Variety of social and professional activities;
— Friendly and warm culture;
— PE employment;
— 24 calendar days of vacation paid per year;
— Medical insurance paid after successful trial period;
— English courses (partly compensated by the company);
— Documented sick days paid;
— 3 non-documented sick days paid;
— Breakfasts/fruit/yummies in the kitchen.