Strong knowledge of Java 8+
Ability to work with any of the message queues (JMS, Rabbit, Amazon SQS, etc.)
Basic relational database skills
Basic knowledge of the Spring framework
Understanding the principles of parallel data processing
Understanding the principles of developing high-load applications
Basic Linux skills
Willingness and desire to learn Scala and Apache Spark
Experience with Hazelcast, ElasticSearch, Redis
Experience with NoSQL databases
Experience with AWS
Experience in developing systems of stream processing of messages on Apache Spark, Apache Flink/Apache Storm
Knowledge of Scala, Big Data
Work in well-organized professional team (Odessa city center or remote)
Opportunities for career growth and for technical learning
English language courses, QA and computer languages courses
Flexible work schedule, possibility to work and getting paid remotely
Optional participation in Events, Conferences, Business trips, Sports activities
Senior Java Engineer (for NetIQ — Data processing system)
Our project is a fault-detection and alerting system that serves the network of one of the biggest cable internet providers in the United States.
We collect various data (logs, metrics, statistics) from a huge number of network routers and process it in a real-time streaming mode.
Currently our input flow is more that 350 thousands messages per second.
The whole project is logically splitted into to parts:
1) Set of Apache Spark applications that ingests initial flow of the data and performs filtration and enrichment. The code is written in Scala (we strive to functional programming approach). We use Amazon Kinesis and Apache Kafka as input message buses, and we run on Amazon EMR.
2) Set of Java-based applications that take messages from the Spark part, and analyze them to detect network anomalies, and automatically creates incidents for network engineers. These Java applications run with Spring boot and use Spring integration to build message processing chains.
We are very flexible in choosing technologies, frameworks and libraries for our solutions.
Currently we use:
Amazon Kinesis and Apache Kafka for transporting massive flow of messages
Amazon SQS for messaging in a transactional way
InfluxDB for storing time series data and calculating metrics on them
Amazon RDS (PostgreSQL and MySQL) for transactional data
Apache Spark for both batch and streaming (we use structured streaming) processing
ElasticSearch for information that requires text-based search
Hazelcast as a distributed cache
Grafana+ElasticSearch and Amazon Cloudwatch for metrics
Amazon EMR to run our Spark jobs
Docker, ECS, EC2, Ansible to run Java applications etc.