DataArt is a global software engineering firm that takes a uniquely human approach to solving problems.
13 апреля 2021

Data Engineer, Online Genealogy Service (вакансия неактивна)

Киев, Харьков, Львов, Днепр, Одесса, Херсон, удаленно

Необходимые навыки

Experience with SQL (MySQL) databases and handling large amounts of data
Comfortable working from the terminal in Linux/Unix (Ubuntu)
Good knowledge of at least one programming language (Ruby, Python etc.)
A hands-on approach to getting stuff done
A curiosity to learn and widen your skillset
Rails (for internal web-based tools)
Experience with ZFS, XML
Tensorflow (not extensively so far — used for ML work)
AWS/Azure (used from time to time)
Experience with Apahce Solr

Будет плюсом

Focus on quality, with testing experience and a willingness to pair collaboratively
Background in DevOps/Systems Administration
Experience with Docker, Git, Kubernetes
Experience with XML processing
A working knowledge of, or an interest in image data processing

Предлагаем

• Professional Development:
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, technology stacks, try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practice of English: courses and communication with colleagues and clients from different countries;
— Support of speakers who make presentations at conferences and meetings of technology communities.
• The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
• Friendly atmosphere, concern for the comfort of specialists;
• Flexible schedule (there are core mandatory hours), the ability to work remotely upon agreement with colleagues;
• The ability to work in any of our development centers.

Обязанности

Managing file systems; managing databases; managing data ingest into Solr and managing Solr at scale
Handling large amounts of XML
Management of internal, web-based tools
Potential to use ML techniques as a part of the process of improving the quality of their OCR, possibly after a few months

О проекте

The client is an international company that provides an online genealogy service that helps its clients understand their past and family history.

We are looking for a Data Engineer who will join a team working on the maintenance of the data workflow and ingestion of scanned newspaper image data. This involves handling a lot of data throughput in a reliable and consistent way.

The specialist will help the existing team to manage the file systems, databases, and data ingestion into Solr, as well as managing internal, web-based tools that the client’s Quality Control team uses to validate images before they are published.

There is also an element of DevOps and Systems Administration — the team works with a significant number of physical and virtual servers, handling deployment pipelines, etc.

In the coming months the client will be investigating an option to include Machine Learning techniques as part of a process to improve the quality of their OCR. There is a likelihood that they will apply some ML techniques over the course of this project, but this is likely only to constitute a part of the role.

There are multiple teams consisting of 5-7 people. The teams include DataArt engineers and stakeholders from the client side working in a mature Agile environment.