DataArt is a global software engineering firm that takes a uniquely human approach to solving problems.
29 ноября 2021

Senior Data Engineer, Online Genealogy Servicer (вакансия неактивна)

Киев, Харьков, Львов, Днепр, Одесса, Херсон, удаленно

The client is an international company that provides an online genealogy service that helps its clients understand their past and family history.

We are looking for a Data Engineer who will join a team working on the maintenance of the data workflow and ingestion of scanned newspaper image data. This involves handling a lot of data throughput in a reliable and consistent way.

The specialist will help the existing team to manage the file systems, databases, and data ingestion into Solr, as well as managing internal, web-based tools that the client’s Quality Control team uses to validate images before they are published.

There is also an element of DevOps and Systems Administration — the team works with a significant number of physical and virtual servers, handling deployment pipelines, etc.

In the coming months, the client will be investigating an option to include Machine Learning techniques as part of a process to improve the quality of their OCR. There is a likelihood that they will apply some ML techniques over the course of this project, but this is likely only to constitute a part of the role.

There are multiple teams consisting of 5-7 people. The teams include DataArt engineers and stakeholders from the client side working in a mature Agile environment.

We hire people not for a project but for the company. If the project (or your work on it) is over you go to another project or to a paid “Idle”.


  • Managing file systems; managing databases; managing data ingest into Solr and managing Solr at scale
  • Handling large amounts of XML
  • Management of internal, web-based tools
  • Potential to use ML techniques as a part of the process of improving the quality of their OCR, possibly after a few months

Must have

  • Experience with SQL (MySQL) databases and handling large amounts of data
  • Comfortable working from the terminal in Linux/Unix (Ubuntu)
  • Good knowledge of at least one programming language (Ruby, Python etc.)
  • A hands-on approach to getting stuff done
  • A curiosity to learn and widen your skillset
  • Rails (for internal web-based tools)
  • Experience with ZFS, XML
  • Tensorflow (not extensively so far — used for ML work)
  • AWS/Azure (used from time to time)
  • Experience with Apache Solr

Would be a plus

  • Focus on quality, with testing experience and a willingness to pair collaboratively
  • Background in DevOps/Systems Administration
  • Experience with Docker, Git, Kubernetes
  • Experience with XML processing
  • Working knowledge of, or an interest in image data processing

We offer

• Professional Development:
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, technology stacks, try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practice of English: courses and communication with colleagues and clients from different countries;
— Support of speakers who make presentations at conferences and meetings of technology communities.
• The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
• Friendly atmosphere, concern for the comfort of specialists;
• Flexible schedule (there are core mandatory hours), the ability to work remotely upon agreement with colleagues;
• The ability to work in any of our development centers.