Jake Mannix

Jake Mannix

Applied Machine Learning Engineer

Greater Seattle Area
Computer Software

As a LinkedIn member, you'll join 300 million other professionals who are sharing connections, ideas, and opportunities.

  • See who you and Jake Mannix know in common
  • Get introduced to Jake Mannix
  • Contact Jake Mannix directly

View Jake's full profile

Jake Mannix's Overview

  • University of California, Santa Cruz

500+ connections


Jake Mannix's Summary

Experienced search / recommender-systems / distributed systems architect and team lead with more than a decade of software engineering experience, currently specializing in distributed machine learning and search-relevance algorithm development as applied to personalized search and content recommendations.

Specialties: distributed faceted real-time search, distributed computing, parallelizing algorithms (but hopefully not paralyzing them!), matrix decomposition, lsi, machine learning, mahout, classification as a service, social network search, lucene, solr, hadoop, nosql, ec2, aws, java, algebraic topology, quantum field theory, inflationary cosmology

Jake Mannix's Skills & Expertise

  1. Large Scale Systems
  2. Lucene
  3. Hadoop
  4. Search
  5. Java
  6. Machine Learning
  7. Information Retrieval
  8. Distributed Systems
  9. Distributed Algorithms
  10. Recommender Systems
  11. Zookeeper
  12. Graph Algorithms
  13. Topic Modeling
  14. Interest Modeling
  15. Scalability
  16. Thrift
  17. Service Oriented Architecture Design
  18. Pig
  19. Mahout
  20. Statistical Modeling
  21. Text Mining
  22. MapReduce
  23. Algorithms
  24. Apache Pig
  25. Open Source
  26. Linux
  27. Natural Language Processing
  28. Apache
  29. Big Data
  30. Ruby
  31. Software Engineering
  32. Spring
  33. Amazon Web Services (AWS)
  34. Solr
  35. NoSQL

View All (35) Skills View Fewer Skills

Jake Mannix's Experience

Principal Machine Learning Engineer

Allen Institute for Artificial Intelligence (AI2)

Public Company; 11-50 employees; Research industry

September 2014Present (1 month) Greater Seattle Area



Applied Machine Learning Engineer


Public Company; 1001-5000 employees; TWTR; Internet industry

August 2010September 2014 (4 years 2 months) Greater Seattle Area

Multiple roles, variety of teams. Fun stuff. See specific role positions below for more details.

Tech Lead, User Modeling:

Built and lead a team of distributed-systems and machine learning engineers responsible for understanding Twitter's "interest graph", taking tweet text, links, and the social graph, and using a variety of classification and topic modeling techniques to build better personalized relevance for various products at the company.

Integrated topic-based personalization systems into the #discover page at Twitter using a mixture of (SGD / logistic-regression based) text classification, graph label propagation, and learning to rank on implicit user feedback engagement/impression data (see MLConf talk here[1]).

Tech Lead, User Search:

Designed and built the self-healing distributed search system for finding user accounts relevant to name / topical queries on twitter.com and the twitter API. Technology we used to build this includes: lucene, hadoop, pig, zookeeper and thrift, and much of it is already open-sourced[2], with more to come at some point. Co-authored a paper [3] on the distributed systems side of this work. Lead the team responsible for the maintenance and improvement of this aspect of Search at Twitter.

IC, Ads ROI:

Working in the Seattle contingent of the Ads ROI team, on a variety of ad products designed to help both large brands and small/medium businesses engage with everyone on the planet.

Lots of scalding, storm, and summingbird.


VP, Apache Mahout

The Apache Software Foundation

Nonprofit; 1001-5000 employees; Computer Software industry

20122013 (1 year)

Served as Chair of Mahout's Project Management Committee, helping make sure that ASF processes are followed by the large and varied Mahout community, involving not only technical advice and help for new users (as all contributors do), but advising users regarding regarding proper use of Apache copyrights and trademarks, and advocating for Mahout adoption within my "day job" at Twitter.

Looking for open source scalable machine learning on your vanilla Hadoop cluster?

For more details, visit us at http://mahout.apache.org

Committer and PMC Member, Giraph Graph Processing Project

The Apache Software Foundation

Nonprofit; 1001-5000 employees; Computer Software industry

September 20112013 (2 years) at large

The Apache Giraph project is a fault-tolerant in-memory distributed graph processing system which runs on top of a standard Hadoop installation, and is capable of running any standard Bulk Synchronous Parallel (BSP) operation over any large generic data set which can be represented as a graph, and is a loose implementation of Google's Pregel.

Committer and PMC Member, Mahout Machine Learning Project

The Apache Software Foundation

Nonprofit; 1001-5000 employees; Computer Software industry

20092013 (4 years)

Helping build an open-source, commercial friendly licensed, scalable machine learning library.

Have focused on improving our bayesian topic modeling work (e.g. LDA) and linear algebra primitives, as well as adding dimensional reduction components for NLP and recommender systems.

Principal Software Engineer - Search and Recommender Systems


Public Company; 5001-10,000 employees; LNKD; Internet industry

July 2008August 2010 (2 years 2 months)

One of the primary architects of the distributed, real-time, faceted people search platform you most likely used to find this profile.

Helped justify, scope out, and build a new engineering team inside of LinkedIn's analytics-engineering organization: from two initial engineers, we built a Recommendation Engine team of roughly (depending on how you look at the org chart, at the time I left) ten engineers, data scientists, and managers.

While interviewing and hiring for this team, created the infrastructure for a generalized entity-to-entity realtime (i.e. results computed online) recommendation system, using a variety of content and usage-based matching techniques with machine learning models trained on our Hadoop cluster.

Implemented and launched a handful of recommendation-based products on top of this infrastructure, including Talent Match ("People for your Job posting"), Jobs You Might Be Interested In, and helped guide into production a half-dozen more as we scaled up the team.

I spent some of my work-time on a couple of extension projects built on top of Apache Lucene:

* Core committer on the high-performance open-source faceted search library, BoboBrowse (http://bobo-browse.googlecode.com).

* Committer on the open-source real-time search and indexing system Zoie (http://zoie.googlecode.com).

Creator/maintainer of the nascent open-source NLP / graph-theoretic matrix library Decomposer (http://decomposer.googlecode.com) (since absorbed into
Apache Mahout)

Search Architecture Development Lead


Privately Held; 11-50 employees; Internet industry

May 2007July 2008 (1 year 3 months)

Directed architectural vision for all things Search-related, from redesigning, developing, and maintaining a high-availability query-expanding job-post (full-text) search engine (as a replicated Tomcat+Lucene+Spring+mySQL-based web service), to R&D of next-generation conceptual applicant/position matching technology, using a ngram-based partially-parallelized (single-box, not ready for Hadoop without algorithm modification) eigen-decomposition algorithm and user feedback for assisted machine learning to provide a personalized search experience.

Designed and implemented a JRuby-based Rails plugin as glue to transparently allow vanilla-seeming ActiveRecord models to say they "acts_as_conceptual" while being coded into simple RoR apps which can scale and perform as J2EE apps.

Lead and mentored junior developers along their technical career path, and participated in business-space technical decision making (buy/build/partner) with CxO-level management.

Sr. Software Developer


Privately Held; 11-50 employees; Internet industry

January 2007May 2007 (5 months)

Implemented and maintained high-performance multithreaded RSS parser/crawler with java.util.concurrent thread pooling, Hibernate/MySQL ORM layer, and Jakarta Commons Digester, and Spring/XFire as the web services interface.

Designed a prototype Latent Semantic Analysis framework (using a custom implemented asymmetric generalized hebbian algorithm for non-memory limited SVD) for content clustering and "related words" concept modeling / query expansion, as well as development of other natural language processing algorithms for finding synonyms and related content using bayesian and other probabilistic techniques.

Wrote cross-browser safe Ruby on Rails code to display searchable / tagable / commentable podcast RSS content.

Sr. Software Development Engineer

CDG (Boeing)

Public Company; 10,001+ employees; BA; Aviation & Aerospace industry

June 2006December 2006 (7 months)

Designed and developed a "Generic Document Viewer Framework" for document viewing (XML/SGML to XHTML), content management, and full-text search for maintenance manuals of Boeing's 787 fleet, as a Java Web Application, from back-end Oracle / Tomcat / SpringFramework / iBatis.

Spec'ed out and wrote XML schema for specifying compartmentalized intra-organizational webapp layout framework.

Wrote AJAX components (with the dojo javascript library) for a client-side model-view-controller architecture for the above mentioned webapp framework, specifically including a Smart (AJAX + SEO friendly) Session History manager.

Software Development Engineer

Centeris Corporation

Privately Held; 11-50 employees; Computer Software industry

June 2005January 2006 (8 months)

General startup technical jack-of-all-trades: developed Java utility code for cross-platform (e.g. including Windows-based) remote administration of Linux systems, maintained company-wide multi-architecture and mutli-language (Java/C/C#) build system (using JUnit, Ant, Cruise Control). Built custom Samba RPMs for a variety of Linux flavors.

Government Agency; 1001-5000 employees; Research industry

June 2004June 2005 (1 year 1 month)

Performed theoretical high energy physics and cosmology research for Stanford University's Institute for Theoretical Physics: developed and tested inflationary cosmology simulation software, with C++/optimized ASM for the computational back-end, Hibernate / MySQL datastore, and Jakarta Struts MVC (Servlet + JSP) for front-end web presentation. Computed numerical differential equation integration (and analytic approximations) for cutting-edge theoretical dark energy models.

Java Developer (remote contract)


July 2003June 2004 (1 year)

Designed and developed from scratch, an object-oriented secure distributed licensing Java library - networked client/server (Commons HttpClient/Servlet) and file-based stand-alone (with DSA for local permissions-store signatures) - for intellectual property rights control on remote contract for a real estate management company, from initial specifications, to UML diagrammatics, to API, working code library and test harness.

Lead Software Development Engineer in Test

Aventail Corporation

Privately Held; 51-200 employees; Computer & Network Security industry

20022003 (1 year)

Managed a team of internal operations, QA tools developers and SDETs in support of automated testing of the company's SOCKS5 VPN/firewall-traversal proxy software, using perl5 ported to java (yikes!).

Software Developer


Public Company; 501-1000 employees; RNWK; Computer Software industry

June 1999August 2001 (2 years 3 months)

C++ Softare Development on the RealServer product line. Integrated OpenSSL into RealMedia architecture in the RealServer core. Designed, spec'd out, and implemented Twofish/Diffie-Hellman based cryptographic C++ library for use with stream encryption (UDP+multicast) in RealSystem.

Jake Mannix's Publications

  • Automatic Management of Partitioned, Replicated Search Services

    • Proceedings of the 2nd ACM Symposium on Cloud Computing
    • October 1, 2011
    Authors: Jake Mannix, Jimmy Lin, Florian Leibert, Babak Hamadani

    Low-latency, high-throughput web services are typically achieved through partitioning, replication, and caching. Although these strategies and the general design of large-scale distributed search systems are well known, the academic literature provides surprisingly few details on deployment and operational considerations in production environments. In this paper, we address this gap by sharing the distributed search architecture that underlies Twitter user search, a service for discovering relevant accounts on the popular microblogging service. Our design makes use of the principle that eliminates the distinction between failure and other anticipated service disruptions: as a result, most operational scenarios share exactly the same code path. This simplicity leads to greater robustness and fault-tolerance. Another salient feature of our architecture is its exclusive reliance on open-source software components, which makes it easier for the community to learn from our experiences and replicate our findings.

Jake Mannix's Projects

  • LinkedIn Search

    • August 2007 to January 2011

    LinkedIn Search Products

  • LinkedIn Recommender Systems

  • Lucene In Action (2nd edition)

    • 2010 to 2010

    Contributed a chapter on search technology behind LinkedIn.com

  • Similar Profiles

    • October 2010 to Present

    As a recruiter looking for candidates, if you’ve ever seen a profile and thought ‘I want more like this’ or you have a superstar employee that you wish you could just clone, then Similar Profiles helps you get that in one click.

Jake Mannix's Education

University of Washington


University of Washington


Stanford University


University of California, Santa Cruz


Jake Mannix's Additional Information

View Jake Mannix’s full profile to...

  • See who you and Jake Mannix know in common
  • Get introduced to Jake Mannix
  • Contact Jake Mannix directly

View Jake's full profile

Not the Jake Mannix you were looking for? View more »

Viewers of this profile also viewed...