Calling all Bay Area engineers passionate about data!

We are looking for an experienced developer working mostly in AWS (EMR), Spark, Python, and Airflow.  This engineer will take part in developing and testing various ETL applications. Building these applications requires team work, and to deliver these solutions, the engineer will collaborate with an interdisciplinary team of experts in machine learning, data visualization & design, business process optimization, and software engineering.  Candidates for this role should have extensive knowledge and experience working with Spark using Airflow, Python, Jinja templating, AWS EMR, AWS S3, AWS CLI.  Ideally, this person also has the ability to tune the ETL applications under various conditions using Spark.

As an early member of the Data Engineering team, this engineer will work closely with senior leadership and will have an important impact on shaping the future of the product, the culture, the company, and the many industries that will be reshaped by the emergence of Enterprise AI.  We have an adventure ahead, and we are going to have a lot of fun along the way!

The results we are working towards:

  • Write custom ETL applications using Spark in Python/Java that follow a standard architecture.
  • Success will be defined by the ability to meet requirements/acceptance criteria, delivery on-time, number of defects, and clear documentation.
  • Perform functional testing, end-to-end testing, performance testing, and UAT of these applications and code written by other members of the team.
  • Proper documentation of the test cases used during QA will be important for success.
  • Other important responsibilities include clear communication with team members as well as timely and thorough code reviews.
  • As you grow in the role, you will have the opportunity to contribute to designing of new applications, setting/changing standards and architecture, and deciding on usage of new technologies.

What you need to bring to the table:


  • Linux – common working knowledge, including navigating through the file system and simple bash scripting
  • Hadoop – common working knowledge, including basic idea behind HDFS and map reduce, and hadoop fs commands.
  • Spark – how to work with RDDs and Data Frames (with emphasis on data frames) to query and perform data manipulation.
  • Python/Java – Python would be ideal but a solid knowledge of Java is also acceptable.
  • SQL 
  • Source Control Management Tool - We use BitBucket


  • Worked/developed in a Linux or Unix environment.
  • Worked in AWS (particularly EMR).
  • Has real hands-on experience developing applications or scripts for a Hadoop environment (Cloudera, Hortonworks, MapR, Apache Hadoop). By that, we mean someone who has written significant code for at least one of these Hadoop distributions.
  • Has experience with ANSI SQL relational database (Oracle, SQL, Postgres, MySQL)


  • Intellectual curiosity!  We are always noodling on new problems.  If you see yourself as a life-long learner who enjoys tackling new challenges, learning about new approaches and tools in your area of expertise, and learning from an interdisciplinary team that encourages you to stretch outside your comfort zone... you will find a home here :)
  • Passion.  (What does this mean?  To us, this means you care deeply about making an impact.  You take ownership of your projects and bring a "founder's mentality" to your work.)

Cool stuff you get to do in this role:

  • Build applications that process hundreds of gigabytes to terabytes of data, some in real-time and near real-time.
  • Opportunities to POC new techniques and tools/technologies.
  • Work in an open, collaborative environment in a really cool office in down-town San Francisco or Palto Alto.


Want to help shape the future of Enterprise Artificial Intelligence? 

Let’s noodle.


Read Full Description
Confirmed 7 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles