Experience with SQL (MySQL) databases and handling large amounts of data
Comfortable working from the terminal in Linux/Unix (Ubuntu)
Good knowledge of at least one programming language (Ruby, Python etc.)
A hands-on approach to getting stuff done
A curiosity to learn and widen your skillset
Rails (for internal web-based tools)
Experience with ZFS, XML
Tensorflow (not extensively so far — used for ML work)
AWS/Azure (used from time to time)
Experience with Apahce Solr
Focus on quality, with testing experience and a willingness to pair collaboratively
Background in DevOps/Systems Administration
Experience with Docker, Git, Kubernetes
Experience with XML processing
A working knowledge of, or an interest in image data processing
• Professional Development:
— Experienced colleagues who are ready to share knowledge;
— The ability to switch projects, technology stacks, try yourself in different roles;
— More than 150 workplaces for advanced training;
— Study and practice of English: courses and communication with colleagues and clients from different countries;
— Support of speakers who make presentations at conferences and meetings of technology communities.
• The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
• Friendly atmosphere, concern for the comfort of specialists;
• Flexible schedule (there are core mandatory hours), the ability to work remotely upon agreement with colleagues;
• The ability to work in any of our development centers.
Managing file systems; managing databases; managing data ingest into Solr and managing Solr at scale
Handling large amounts of XML
Management of internal, web-based tools
Potential to use ML techniques as a part of the process of improving the quality of their OCR, possibly after a few months
The client is an international company that provides an online genealogy service that helps its clients understand their past and family history.
We are looking for a Data Engineer who will join a team working on the maintenance of the data workflow and ingestion of scanned newspaper image data. This involves handling a lot of data throughput in a reliable and consistent way.
The specialist will help the existing team to manage the file systems, databases, and data ingestion into Solr, as well as managing internal, web-based tools that the client’s Quality Control team uses to validate images before they are published.
There is also an element of DevOps and Systems Administration — the team works with a significant number of physical and virtual servers, handling deployment pipelines, etc.
In the coming months the client will be investigating an option to include Machine Learning techniques as part of a process to improve the quality of their OCR. There is a likelihood that they will apply some ML techniques over the course of this project, but this is likely only to constitute a part of the role.
There are multiple teams consisting of