Big Data & Data Engineer Requirements:
• BE/B.Tech/ B.Sc. in Computer Science/ Sta/s/cs, Econometrics from an accredited college or university.
• Minimum 3 years of extensive experience in designing, building, and deployment of PySpark-based applica/ons.
• Exper/se in handling complex large-scale Big Data environments preferably (20Tb+).
• Good implementa/on experience of OOPS concepts.
• Hands-on experience wri/ng complex SQL queries, expor/ng, and impor/ng large amounts of data using u/li/es.
• Ability to build abstracted, modularized reusable code components.
• Hands-on experience in genera/ng/parsing XML, JSON documents, and REST API request/responses.
• Able to quickly adapt and learn.
• Able to jump into an ambiguous situa/on and take the lead on resolu/on.
• Able to communicate and coordinate across various teams.
• Comfortable tackling new challenges and new ways of working.
• Ready to move from tradi/onal methods and adapt into agile ones.
• Comfortable challenging your peers and leadership team.
• Can prove yourself quickly and decisively.
• Excellent communica/on skills and Good Customer Centricity.
• Strong Target & High Solu/on Orienta/on.
— Postgraduate qualification in a quantitative field
— Software engineering & continuous integration experience
— Knowledge of the broader hadoop ecosystem
— Experience with docker and kubernetes
— IBM Watson
— Flexible working hours
— Work with global customers on bleeding edge technologies
— Have fun and build cool stuff
Big Data & Data Engineer Key Responsibilities:
• Ability to design, build and unit test applications on Spark framework on Python.
• Build PySpark based applications for both batch and streaming requirements, which will require in-depth
knowledge on majority of Hadoop and NoSQL databases as well (Elasticsearch, Cassandra, etc.).
• Develop and execute data pipeline testing processes and validate business rules and policies.
• Optimize performance of the built Spark applications using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDD’s.
• Ability to design & build real-time applications using Spark Streaming.
• Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec.
• Build data tokenization libraries and integrate with Spark for column-level obfuscation.
• Experience in processing large amounts of structured and unstructured data, including integrating data from
multiple sources.
• Create and maintain integration and regression testing framework on GIT repositories.
• Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings.
• Work collaboratively with both the onsite and offshore teams.
• Develop & review technical documentation for artifacts delivered.
• Ability to solve complex data-driven scenarios and triage defects and production issues.
• Ability to learn-unlearn-relearn concepts with an open and analytical mindset.
• Participate in code release and production deployment.
• Challenge and inspire team members to achieve business results in a fast paced and rapidly changing
environment.
• Ability to deeply integrate Spark, Hadoop, etc in a Kubernetes environment.
We are developing a sophisticated financial analytics platform for fortune 500 companies. The project is well funded and we are building an amazing team of cross-disciplinary engineers from the ground up to deliver on the vision for the client so this is an excellent and exciting time to get involved.