20 октября 2021

Data Engineer (Python)


Our partner’s Core AI team is in charge of building models for the next generation of AI-powered TV products. They are responsible for the end-to-end development of their models, including:

  • dataset collection using geographically distributed television labs;
  • model training in the cloud using serverless GPU clusters;
  • model optimization for constrained computation on the edge;
  • model testing using both virtual and real televisions; and
  • creation of the data pipelines and tooling that makes the above possible.

As a member of their team, you will be working at the intersection of engineering, science, and entertainment. As a Data Engineer, you make it all possible. You enable the team by building robust data pipelines and tooling for accelerating the Model Development Life Cycle.


  • Design, build and maintain data pipelines (for training/test video from television labs, model artifacts, model evaluation results, and summaries, etc.) that are scalable, robust, and secure
  • Promote great software engineering practices and help improve our processes and establish new ones
  • Enable Machine Learning engineers to succeed in the end-to-end model development process by designing tools and processes that simplify working with labeled data, features, models, and relevant metrics


  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • 4+ years of professional development experience building high-performance, large-scale applications/pipelines with solid experience in Python
  • Solid foundation in computer science, with strong competencies in data structures, algorithms and software design
  • Experience with at least one distributed computation framework (Spark, Hive, Dask, Metaflow, etc.)
  • Experience with at least one job orchestration framework (Airflow, Luigi, etc.)
  • Strong command of Linux and version control systems
  • Strong verbal and written communication skills

Preferred Qualifications

  • Experience with components of modern Machine Learning architectures—feature stores, model stores, evaluation stores, etc.
  • Familiarity with cloud providers and serverless architectures (Amazon Web Services, Google Cloud, etc.)
  • Familiarity with container orchestration tools (Kubernetes, ECS, etc.) in a production setting
  • Experience working with video—video capture, video processing, transcoding, frame analysis, ffmpeg