H2O.ai builds an open source parallel distributed in-memory machine learning platform which allows customers to quickly build high-performance, sophisticated ML models on terabytes of data.
The server is implemented in Java with the expertise of our technical founder, who wrote the HotSpot server JIT for JavaSoft. The ML algorithms are developed in consultation with our three Stanford advisors, Trevor Hastie, Rob Tibshirani and Stephen Boyd.
The platform is accessible via R, Python, Scala, Java, a REST API and a notebook-style web interface. Our paying customer list includes many of the largest insurance companies, banks and healthcare companies, many of the big name unicorns in tech, startups, and on and on.
We are looking for hardcore software developers to work both on the distributed compute platform and on implementing and improving Machine Learning algorithms for it. We support clusters of at least 3200 cores and tens of terabytes of RAM.
We are looking for members of our Quality Engineering team at both the lead and the individual contributor levels for Machine Learning and Distributed Platform QE. You will work very closely with teams creating H2O, our open source machine learning platform, as well as on other products built atop and complementing H2O. This job involves actual design, implementation, and running of black-box and white-box testing to exercise functionality, performance, scalability and stress of our distributed solution. This is an excellent opportunity to learn about machine learning as a key member of our world-class team. We are looking for hardcore developers, not just testers. The boundary between QE and development is very permeable at H2O.ai.
Machine learning quality engineers will work with the algorithm engineering and data science teams to test the correctness of the ML algorithms by writing self-checking tests in R and/or Python that verify that H2O gets the math right. This including comparing against the math in published papers, handling of missing values, identifying and testing edge cases, testing for numerical stability, and so on. A strong math/statistics background is essential, as is good working knowledge of Python, R or both.
Distributed systems quality engineers will work with the distributed systems platform engineers on inspecting, testing and improving the core platform Java code for correctness and performance, and with the algorithm engineers on testing and improving performance of the ML algorithms. This includes trying to break and ensuring the performance of the distributed in-memory data storage and compute layers, as well as performance regression benchmarking and competitive benchmarking. For these roles, distributed/parallel systems background is essential. BS-level knowledge of distributed systems (multithreading, locking and races, high performance network I/O) is necessary and MS-level is highly desired. At least one of: Java/Scala/C/C++/Scala/Haskell/similar.
Education and Experience
- Proven programming education, ability, and experience
- 2-5 years of previous experience in quality engineering/development.
- Desirable: Master’s Degree in Computer Science or related field.
- Desirable: Experience with machine learning algorithms and/or math/statistics and/or distributed/parallel systems.
- (Lead) proven leadership ability in a previous role.
Skills and Abilities
- Excellent programming ability.
- (Machine Learning) Experience with Python and/or R.
- (Distributed System) Experience with at least one of: Java/Scala/C/C++/Scala/Haskell/similar.
- Proven success shipping code which solves difficult systems-level problems
- Experience with test automation and CI (Jenkins)
- Experience with any/all of Machine Learning, Hadoop, or Spark is desired.