Chief Architect at ClipMine, Inc.
|Recommendations||16 people have recommended Siddharth|
Join LinkedIn & access Siddharth's full profile
I'm a hands-on software architect with deep experience building and scaling data infrastructure at high-traffic web sites.
I'm currently the Chief Architect at ClipMine, a video mining and search company. Prior to joining ClipMine, I held technical and leadership positions at LinkedIn, Netflix, Etsy, eBay, and Siebel Systems.
Outside of work, I help venture-backed startups build scalable systems.
I also enjoy speaking at a few conference every year, including OSCON, GigaOM Structure, Hadoop Summit, QCon (SF/London/NYC), etc...
Building a video mining, search, and watch-experience startup!
Member of the Program Committee for QCon SF
I build search systems at scale at LinkedIn
• Helped create LinkedIn's new search infrastructure (Galene)
• Led the search Indexing team, responsible for building offline (Hadoop) and near-realtime (Databus, Samza, Kafka) indexes
• Developed search typeahead (autocomplete) services. Typeahead comes in 2 flavors: Graph typeahead (e.g. members) and regular typeahead (e.g. companies, groups, schools, skills)
The Analytics Platform (AP) team at LinkedIn is responsible for making online and offline data available for analysis. The data typically originates from OLTP databases (e.g. RDBMS, NoSQL), application event streams (e.g. Click Tracking), and from machine learning algorithms. AP will make this data available for Hive queries, Pig jobs, and normal M/R jobs as well as for ETL engineers to load into a traditional Data Warehouse
• Lead LISTT (LinkedIn Segmentation & Targeting Tool). LISTT provides a self-service tool that
marketing operations can use to target LinkedIn members for online marketing needs. It does this by leveraging both Hive and Pig to materialize a large table in Hadoop. This table is then converted into multiple formats : Teradata load-ready format & Lucene indexes for a custom search application
• Presented LISTT at Hadoop Summit 2013
In January 2009, the Cloud Systems team was formed to pioneer Netflix’s migration to a new cloud-based architecture, thereby extending availability and scalability. By EOY 2010, 90% of our traffic was served out of AWS.
• Part of a 5-person team to design and build Netflix’s cloud-based architecture
• Responsible for Netflix’s Cloud-based Database Architecture (e.g. Cassandra, SimpleDB, S3, etc...)
• Worked closely with AWS to align their product roadmap with our needs
• Evangelized NoSQL, Data Replication, & Cloud Best Practices internally and externally
• Authored a white paper titled “Netflix’s Transition to High-Availability Storage Systems”
• Acted as a Netflix Crisis Manager – periodically lead the resolution of critical company-wide production emergencies
• Created a multi-master, hybrid Oracle-NoSQL system to manage our largest data sets – my replication framework dealt with billions of records and copied incremental changes with latencies averaging 5 seconds – patent : United States 2011/103537
As a member of the Software Infrastructure team, I helped define, design, and implement new architectures – all Netflix systems run on our architecture. I also solved critical performance problems, resulting in cost reduction and service improvement.
• Led the identification and resolution of various Denial-of-Service exploits. Evangelized DoS prevention.
• Found and eliminated a critical performance bug in the streaming PC player – this fix reduced DB traffic by 50% to 2 key tables, reducing our need to vertically scale the database
• Increased farm-wide memory headroom by 10% by eliminating the use of a Java Finalizer
• Created Netflix’s Session Manager to deliver consistently fast user response times under high traffic
• Created Netflix’s web request processing framework to improve developer productivity and code robustness
• Created a deadlock recovery system to detect production deadlocks early and take preemptive action before end users could be affected
• Invented 2 internal performance optimization tools (i.e. Tracer Central & Tracer Regression Central) – Netflix Engineering relies on these tools to understand traffic growth and code/site performance
• Managed a direct budget of $1.5M and built an engineering team of 16 engineers to create the site
• Designed and wrote a real-time application logging and analytic application
As a member of eBay’s Search Engine team, I was tasked with building new search services.
• With help from a co-worker, I built eBay’s first search engine for buyer behavior, central to eBay’s default search sort (a.k.a. the Best Match sorting algorithm)
• Implemented a search service to find (fuzzy) near matches by user id, first name, last name, or full name
As a member of the Ebay Research Labs, I had the opportunity to work on various early-stage prototypes:
• Designed and implemented a novel P2P version of eBay (Confidential)
• Worked with a colleague to build a WYSIWYG eBay Store builder leveraging browser-side technologies
My career at eBay started as a founding member of the Stores & Merchandising team. During my tenure with this team, I led projects touching functionality across all areas of the site.
• Led projects that touched Search, MyeBay, Selling, Buying, Sign-on, API, etc...
• Proposed & led a project to re-architect the eBay subscriptions framework
• Submitted 5 innovation ideas centered around uses of AJAX on the site – all 5 ideas adopted
• Analyzed system scalability and performance bottlenecks
• Solved critical customer performance issues during a 2 month-loan to Siebel Expert Services
Methods, systems, and articles for simultaneously maintaining copies of data in a data center and a cloud computing environment providing network based services. Synchronizing applications monitor modifications to data records made in the data center and the cloud computing environment. The synchronizing applications are also configured to convert modified records from the data center into a format compatible with databases in the cloud computing environment prior to updating the databases in the cloud computing environment, and vice versa.
A method and system for building a point-in-time snapshot of an eventually-consistent data store. The data store includes key-value pairs stored on a plurality of storage nodes. In one embodiment, the data store is implemented as an Apache® Cassandra database running in the “cloud.” The data store includes a journaling mechanism that stores journals (i.e., inconsistent snapshots) of the data store on each node at various intervals. In Cassandra, these snapshots are sorted string tables that may be copied to a back-up storage location. A cluster of processing nodes may retrieve and resolve the inconsistent snapshots to generate a point-in-time snapshot of the data store corresponding to a lagging consistency point. In addition, the point-in-time snapshot may be updated as any new inconsistent snapshots are generated by the data store such that the lagging consistency point associated with the updated point-in-time snapshot is more recent.
Over 300 million professionals are already on LinkedIn. Find who you know.
Software Engineer at LinkedIn
Manager, Data Science & Engineering at Netflix
Principal Staff Engineer at LinkedIn
Senior Software Engineer at LinkedIn
Director of Engineering - Recommendations, Personalization & Online Experimentation at LinkedIn
Principal Software Engineer
Senior Director of Engineering at LinkedIn
Senior Commodity Manager at Amazon Lab126
San Francisco Bay Area
Vice President - Marketing & Franchise Development at CENTURY 21 India
Mumbai Area, India
Legal Counsel-India at Harley-Davidson Motor Company
Associate Manager - Operations at Orkla ASA
Bengaluru Area, India
Key Accounts Manager at Biocon
New Delhi Area, India