Required skills:
• Production experience with Spark at scale
• Building data pipelines, CICD pipelines, and fit for purpose data stores
• Dimensional Data Modeling
• Building data pipelines that process more than 1TB both in streaming and batch mode
• Working with data consumption patterns
• Working with automated build and continuous integration systems
Technologies:
• Microservices development in two of these languages: Python, Java, Scala
• Big Data Technologies: Apache Spark, Hadoop
• Relational Databases: Postgres, MySQL
• NoSQL: MongoDB, DynamoDB
• Data-warehousing products: Snowflake or Redshift
• Cloud technologies: AWS (Terraform, S3, EMR, EC2, Glue, Athena)
• Orchestration: Apache Airflow and MLFlow
Nice to have:
• KubeFlow
• Spark on Kubernetes
We need a hero:
• someone who has played a role at a Lead /arch level and has been involved with dealing with data at scale in PROD
• who is able to analyze, assess, document, and design scalable and sustainable data architecture and data transformation processes within a large analytics pipeline
• who will review the implementation of improvements to the data architecture, providing feedback and guidance to product developers, data scientists, and product owners
• is comfortable collaborating with various teams/regions in driving facilitating data design, identifying architectural risks, and developing and refining data models and architecture frameworks.
Product:
Сloud-based, AI-driven SaaS solution using advanced predictive analytics and proactive device insights to monitor your fleet’s performance. The solution gives IT teams the tools to predict, diagnose, and prevent common PC health and performance problems, at scale, improving employee uptime and productivity.
The platform collects analytics from different devices and predicts when they need maintenance using Data science.