Peter Skomoroch

Peter Skomoroch

Principal Data Scientist

San Francisco, California (San Francisco Bay Area)

As a LinkedIn member, you'll join 300 million other professionals who are sharing connections, ideas, and opportunities.

  • See who you and Peter Skomoroch know in common
  • Get introduced to Peter Skomoroch
  • Contact Peter Skomoroch directly

View Peter's full profile

Peter Skomoroch's Overview

  • Equity Partner at Data Collective

500+ connections


Peter Skomoroch's Summary

I'm a data scientist and entrepreneur focused on building intelligent systems to automate tasks and enable better decisions. I specialize in solving hard algorithmic problems, leading cross-functional teams, and developing engaging products powered by data and machine learning.

Most recently, I applied my skills to the consumer internet space at LinkedIn, the world's largest professional network, where I was an early member of the data science team. As Principal Data Scientist, I led data science teams focused on reputation, search, inferred identity and building data products. I was also the creator of LinkedIn Skills & LinkedIn Endorsements. Endorsements was one of the fastest growing new product features in LinkedIn's history with over 3 billion endorsements of more than 70 million members within the first year after launch.

Before joining LinkedIn, I was Director of Analytics at Juice Analytics and a Senior Research Engineer at AOL Search. In a previous life, I developed price optimization models for Fortune 500 retailers, studied machine learning at MIT, and worked on Biodefense projects for DARPA and The Department of Defense. I have a B.S. in Mathematics and Physics from Brandeis University and research experience in Biology and Neuroscience.

Peter Skomoroch's Skills & Expertise

  1. Machine Learning
  2. Data Science
  3. Big Data
  4. Information Retrieval
  5. Product Management
  6. Text Mining
  7. Recommender Systems
  8. Data Mining
  9. Algorithms
  10. Statistical Learning
  11. Collaborative Filtering
  12. Hadoop
  13. MapReduce
  14. SciPy
  15. Natural Language Processing
  16. Artificial Intelligence
  17. Python
  18. Statistics
  19. Data Analysis
  20. Apache Pig
  21. Ruby
  22. Amazon Web Services (AWS)
  23. Public Speaking
  24. Data Visualization
  25. Hive
  26. Ruby on Rails
  27. Analytics
  28. Text Classification
  29. Information Extraction
  30. Amazon EC2
  31. Cloud Computing
  32. Datasets
  33. Amazon Mechanical Turk
  34. R
  35. Linkedin Endorsements
  36. Rapid Prototyping
  37. Sinatra
  38. Django
  39. Voldemort
  40. Neural Networks
  41. Mashups
  42. Nutch
  43. Lucene
  44. NoSQL
  45. Putting Out Fires
  46. Open Source
  47. Soul Retrieval
  48. Distributed Systems
  49. Yacht Racing

View All (49) Skills View Fewer Skills

Peter Skomoroch's Experience

Equity Partner

Data Collective

August 2011Present (3 years 2 months)

Helping Matt Ocko and Zachary Bogue with due diligence and advising Data Collective portfolio companies. Data Collective (DCVC) invests in data startups focused on infrastructure, analytics, and in applications that leverage data - including verticals like lending, travel, customer service, and medical research. Investors in companies like Kaggle, Parse, Trifacta, MemSQL, Interana, PlanetLabs, Platfora, ZenPayroll, FlipTop, Freshplum, MongoHQ, LendUp and Moleculo.

Principal Data Scientist


Public Company; 5001-10,000 employees; LNKD; Internet industry

September 2009October 2013 (4 years 2 months) Mountain View, CA

Led teams of Data Scientists focused on Reputation, Inferred Identity and Data Products. Was lead Data Scientist and creator of LinkedIn Skills & Endorsements, one of the fastest growing new products in LinkedIn's history. We reached over 3 Billion member endorsements 1 year after launch in October 2013, adding rich skill data and reputation signals to over 60 million member profiles.

Our projects included features like LinkedIn Skills, Suggested Skills, PeopleRank, Endorsements, and InMaps. Our team's specialties include entity extraction & discovery, recommendation algorithms, economic insights, network intelligence & dynamics.

In late 2009, as Sr. Data Scientist I built the original prototype of LinkedIn Skills using Hadoop & Rails, then worked with a talented team of engineers & designers to build and ship Skills on I served dual role as Product Manager and Sr. Data Scientist for 6 months following the launch of Skills before moving into management roles.

We worked on a number of other efforts that mined information from LinkedIn profile content, the social graph, and external data sources to build data driven products and surface actionable insights for members. Our tool set included things like Hadoop, Pig, Hive, Voldemort, Mechanical Turk, Java, Python, NLTK, along with various machine learning and numerical libraries.


Common Crawl

Nonprofit; 1-10 employees; Internet industry

November 2011September 2013 (1 year 11 months) San Francisco Bay Area

Common Crawl is a non-profit foundation dedicated to building and maintaining an open crawl of the web, thereby enabling a new wave of innovation, education, and research.


Data Wrangling

2006October 2009 (3 years) Mountain View, CA

Lead consultant at Data Wrangling offering software development services for clients in need of scalable data mining or search applications.

Built an open source Rails application that identifies trends on the web by using Hadoop, Hive, and Python to process Wikipedia log files on Amazon EC2.

Wrote articles and documentation for companies such as Cloudera and Amazon Web Services to demonstrate scalable processing of Netflix ratings, Last.FM listening data, and Wikipedia logs.

Designed and built the backend of an on-demand proteomic search system for a bioinformatics client. Released core code as the “ec2cluster” project on GitHub: a Rails web console, including a REST API, that launches temporary MPI clusters on Amazon EC2 for scalable parallel processing.

Provided basic consulting services for clients running MPI on EC2. Released Elasticwulf on google code: Python command line tools to launch and configure a distributed cluster on Amazon.

Machine learning consultant to a small investment fund. Mined commercial financial data, SEC filings, and alternative information sources on the web using machine learning and Mechanical Turk.

Director of Analytics

Juice Analytics

Privately Held; 11-50 employees; Information Technology and Services industry

20082009 (1 year)

Developed a Django based web analytics application called Concentrate that discovers and visualizes patterns in search query data. Built backend infrastructure for text mining using Amazon EC2, SQS, and S3 using boto. Data processing was mainly implemented with SciPy, C, and the Python Natural Language Toolkit. Automated continuous integration on EC2 with Selenium, Hudson, and PyUnit. Payment system used Satchmo, deployment done via Capistrano and Puppet.

Developed a scalable pattern clustering algorithm for Concentrate that automatically discovers patterns in large amounts of search data and clusters long tail queries into manageable groups.

Represented Juice at several conferences including giving a talk at PyCon 2008 on processing data with Amazon EC2

Consulted on several client projects including processing marketing survey data for a media company and analyzing spatial vehicle usage patterns in customer data for FlexCar

Sr. Research Engineer


Public Company; 5001-10,000 employees; AOL; Internet industry

20062007 (1 year)

Member of the Search Analytics team at AOL

Developed search referral prediction system that applied machine learning techniques to query logs, web crawl data, and internal server logs to recommend site improvements and measure external competition in multiple content areas. Implemented using Nutch and Hadoop, along with Python NLTK, NumPy, and SciPy.

Lead engineer on a project building a web-based search analytics tool used to track the timing of bot activity in web logs, identify uncrawled sections of web properties, and improve the crawlability of large websites. The system included Ruby on Rails front-end and REST API to serve graph data and metrics. Backend used Python logfile parsers and a Hadoop cluster to build link graphs and summarize page content.

Research Staff

MIT Lincoln Laboratory

Educational Institution; 1001-5000 employees; Defense & Space industry

April 2004July 2006 (2 years 4 months)

Designed and implemented a prototype web-based decision support system and sensor data warehouse using Python & Oracle. Role included direction and training of junior staff members, design of underlying data models, system interfaces, & data visualization components.

Principal software & algorithm engineer developing dense, low-cost chemical and biological sensing networks using wireless sensor motes. Wrote sensor network detection algorithms, designed network data warehouse, constructed web-based front end for the system, and wrote embedded nesC code for Mica2 Crossbow wireless sensor boards. Performed simulation & analysis of the system in Matlab.

Applied machine learning techniques to significantly improve the accuracy of a prototype sensor to detect pathogens in time resolved environmental measurements.

Implemented prototype system to collect health information via cell phones and display population data in real-time via the web. Java architecture included a Quartz job scheduler, Sprint Location SOAP Web Services, PKCS12 security, request throttling logic, Oracle Spatial, & AJAX to display live results on Google maps.

Developed embedded TCP/IP socket layer code in C for a TI DSP based biosensor. Implemented embedded web server in C on the sensor with a SOAP access for automatic sensor discovery.

Designed data warehouse and web service infrastructure for the integration of streaming real-time sensor data. Wrote object-oriented C++ hardware drivers to process and upload large amounts of streaming data to Oracle in real time.

Developed Matlab simulations analyzing U.S. Census data in combination with environmental spatial datasets to study the effects of air particulate deposition with under multiple weather conditions.

Constructed performance models of an indoor biological sensor system for the protection of buildings. The models evaluated technical performance, cost, and simulated operations of the system to optimize sensor layout.

Database Consultant

Fidelity Investments

Privately Held; 10,001+ employees; Financial Services industry

October 2003April 2004 (7 months)

Built Oracle PL/SQL logic for brokerage applications to analyze campaign effectiveness, report trends, and track customer interactions. Constructed Java servlets and SOAP web services to process XML database requests. Performance tuned slow running applications and optimized SQL statements.

Worked with off-shore development teams in India and Ireland to develop Oracle and Siebel applications. Prototyped new database error handling and debugging approaches along with an automated build/test/deployment process for database code using Ant, JUnit, and Dbunit.

Software Engineer


Privately Held; 201-500 employees; Computer Software industry

November 2002October 2003 (1 year)

Part of the Calc Engine team: backend system processed historical time series of retail transaction data to estimate prediction model parameters. Designed and loaded database schemas containing these parameters for use by the price optimization engine. Wrote and tuned SQL queries used by the forecast engine. Ported Oracle PL/SQL code to Java Stored Procedures for Oracle/DB2 dual platform product release.

Worked with ProfitLogic clients (Fortune 500 retailers) to refine business requirements for our products and rapidly fix performance issues / bugs. Often obtained performance improvements of 5-10x in slow SQL queries. Commended by clients and management for immediate resolution of issues.

Assisted R&D group with projects involving maximum likelihood estimation, Bayesian parameter estimation, genetic algorithms, seasonality, and clustering.



Privately Held; 201-500 employees; Computer Software industry

June 2000November 2002 (2 years 6 months)

Responsible for running the weekly forecast and price optimization model of our first major client (JCPenney). On call to resolve issues with the client and algorithm recommendations, ensuring that we met service level agreements.

Surfaced model accuracy issues with senior management and was allocated resources to construct an out-of-sample forecast testing system using Oracle, Mathematica, and Python. Worked with R&D to develop an improved model that became part of the standard software release. Developed empirical methodology for results measurement that was used to demonstrate up to 15% improvement in profits for clients.

Analyzed retail transaction data stored in Oracle and Teradata using Mathematica & Python to characterize the influence of climate, price, promotional events, holidays, store-performance, and other demand drivers on sales of a wide range of merchandise types. Developed production forecast model parameter estimation code in PL/SQL.

Ran forecast tests on data from prospective clients, came up with ROI and value propositions, and developed compelling information visualizations for PowerPoint decks and sales pitches.

Peter Skomoroch's Languages

  • Spanish

Peter Skomoroch's Patents

  • Methods and Systems for Exploring Career Options

    • United States Patent US20120226623 A1
  • Skills Endorsements

    • United States Patent Application 13/672,377
    • Filed November 8, 2012
  • Skill Extraction System

    • United States Patent US8650177
    • Issued February 11, 2014

    Machine automated method of identifying a set of skills

  • Skill Ranking System

    • United States Patent Application 13/357,302
    • Filed January 24, 2012
  • Skill Customization System

    • United States Patent Application 13/357,360
    • Filed January 24, 2012
  • Inferring and Suggesting Attribute Values For a Social Network Service

    • United States Patent Application 13/629,241
    • Filed September 27, 2012
  • Methods & Systems for Recommending Decision Makers in an Organization

    • United States Patent Application 3080.132PRV
    • Filed September 30, 2013
  • Inferred Identity

    • United States Patent Application 14/292,779

Peter Skomoroch's Education

Massachusetts Institute of Technology

Nondegree Student, Machine Learning


Brandeis University

B.S., Mathematics, Physics


Operations Intern, Cignal Global Communications - Cambridge, MA, 1999-2000
Physics Research Assistant, Bucknell University - Lewisburg, PA, Summer 1999
Anatomy & Cell Biology Research Assistant, SUNY Health Science Center- Syracuse, NY, 1997-1998
Biophysics Research Assistant, SUNY Health Science Center - Syracuse, NY, 1995-1996
Neuroscience Research Assistant, Institute For Sensory Research, Syracuse, NY, 1994-1995

Campus jobs included: Undergraduate Physics TA, Electronics Technician for Physics Department, Calculus Grader, Physics Tutor, Calculus Tutor

Peter Skomoroch's Courses

  • Nondegree Student, Machine Learning

    Massachusetts Institute of Technology

    • Machine Learning (6.867)
    • Neural Networks (9.641J)
    • Real Analysis (18.100B)

Peter Skomoroch's Projects

  • LinkedIn Skills

    • October 2009 to Present

    LinkedIn Skills & Expertise is a set of tens of thousands of topic pages automatically constructed from LinkedIn profiles and external data sources. Using a variety of signals, we identify the most relevant people, places, and companies for each topic, track trends, and suggest skills users may want to add to their profiles.

  • Veterans Hackday 2011

    • November 2011 to November 2011

    Organized LinkedIn's first Veterans Hackday in conjunction with the White House to encourage hackers all over the country to build projects that benefit veterans. We had 44 projects submitted from around the country, 11 awesome finalists, and the celebrity judges (Tim O'Reilly, Sumit Agarwal, Jeff Weiner, Chris Vein) picked 3 amazing winners.

  • DataFu

    • September 2011 to Present

    DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like “People You May Know” and “Skills”

  • Skill & Expertise Endorsements

    • June 2012 to Present

    Interface design incorporating social proof and a light weight endorsement action to Profile Skills. This feature leveraged earlier work on Profile Guided Editing, and used the same guided UI to suggest skill endorsements to profile viewers. Recipients of the endorsement receive an email and on-site notification, with a landing experience that suggests they endorse people they know - creating a feel-good viral loop.

Peter Skomoroch's Volunteer Experience & Causes

  • Volunteer Interests

    • Causes I care about:

      • Economic Empowerment
      • Education
      • Science and Technology
    • Organizations I support:

      • Code for America

Peter Skomoroch's Additional Information


Machine learning, Information Retrieval, Search, Data Mining, Physics, Embedded Systems, Wireless Sensor Networks, Computational Neuroscience, Mathematical Finance, Optimization Algorithms, Prediction Markets, Collaborative Filtering, Parallel Programming and Cluster Computing, Python, Ruby, Web Frameworks, Mashups, General software engineering, Analytics, Data Visualization

Groups and Associations:

Data Drinking Group

Honors and Awards:

Westinghouse Science Talent Search Semifinalist, Brandeis University Presidential Scholarship, ...

Contact Peter for:

  • expertise requests
  • getting back in touch

View Peter Skomoroch’s full profile to...

  • See who you and Peter Skomoroch know in common
  • Get introduced to Peter Skomoroch
  • Contact Peter Skomoroch directly

View Peter's full profile

Not the Peter Skomoroch you were looking for? View more »

Viewers of this profile also viewed...