
Manager, Site Reliability Engineering (SRE) at Facebook
San Francisco Bay Area

Manager, Site Reliability Engineering (SRE) at Facebook
San Francisco Bay Area
Technical Manager with broad internet hardware and operating systems experience. Experience with high availability and load balancing environments on linux systems integrating application server technology to meet high demand web services.
Network Operations, Linux/Unix server farm managements, High Availability environments, Network Engineering, Technical Management, PHP, Memcache, High performance MySQL
(Privately Held; Internet industry)
August 2009 — Present (5 months)
(Privately Held; Online Media industry)
March 2008 — Present (1 year 10 months)
Technical Leader providing guidance, support and hands-on management of staff that manages LAMP server network for a top 10 social networking site focused on music, celebrities and pop-culture with over 40million unique visitors and 200million page views per month.
* Manage and scale infrastructure for a high-availability clustered linux network running Apache, PHP, MySQL, memcached, sphinx, MovableType, Vbulletin and WordPress
* Conduct performance analysis on sites and effect changes to improve stability and performance. Eg: analysis on one site yielded 20% pageview increase and 800% SEO increase.
* Work closely with developers to create scalable, cacheable solutions.
* Evaluate technology vendors to select best of breed hardware solutions while meeting budgetary constraints.
* implemented memcache solutions for high-traffic blogs that increased performance by 5x, decreased database queries 75% and increase server scaling by 3x
(Privately Held; Online Media industry)
November 2007 — Present (2 years 2 months)
Provide technical management to a group of Systems Administrators, PHP developers and Database Administrators that handle the back-end systems and architecture for a social networking application.
* Manage and scale infrastructure for a high-availability clustered linux network running Apache, PHP, MySQL, memcached and sphinx
* Grew system that sustained 700% growth in 3 months by implementing and designing the database, system and hardware architecture to meet product requirements and scale to meet user demand
* Responsible for the entire spectrum of operations: from high-level design of sophisticated systems to fixing bugs in code.
* Designed infrastructure using Shards, remote caches, clustered databases and proxied web servers
(Privately Held; 51-200 employees; Information Technology and Services industry)
March 2004 — November 2007 (3 years 9 months)
Provide technical and non-technical management to a geographically dispersed, team of sysadmins, systems engineers, and database administrators in three datacenter locations and four hosted locations with 300+ servers in a mixed Linux/Windows, high-availability (99.97%, as measured by Gomez, Inc.), internet-facing environment.
• Manage and scale HA clustered Linux servers running Apache, Java, PHP frameworks that handle over 250 million hits and over 4.5 million visitors daily.
• Maintain high-availability replicated MySQL and MS SQL server infrastructure. Replication includes MySQL one-way, chained and circular replication. Transactional replication for MS SQL server. Define strategy to provide best-of-breed database practices within development group.
• Manage capacity planning and evaluate and order systems and technologies to manage horizontal and vertical growth as defined by business units. and deployment of same. Draft yearly capital expenditure budget ($X.X MM for 2007).
(Self-Employed; Myself Only; Information Technology and Services industry)
November 2001 — March 2004 (2 years 5 months)
(Privately Held; 201-500 employees; Information Technology and Services industry)
November 2000 — September 2001 (11 months)
Planned and managed the corporate launch and timely openings of 11 Internet Data Centers (IDCs) including network deployment and build-out; IDC security installation; phone system installation/configuration; and physical installation of Fiber/Cat3/5e/6 and associated Cisco and Juniper equipment at IDCs.
Developed day-to-day operations policies and procedures for 143 nationwide IDC staff members and trained staff on topics including basic routing, identifying and fixing cabling issues, courteous customer communication, security procedures and activities needed to perform customer installations.
Employed knowledge of Data Centers and LAN/WAN networking technology to define support procedures for Customer Care, Technical Support and Network Operations Center
Allocated budgeted resources ($2.75 million yearly) to 11 IDCs
Provided technical guidance on product development effort for Managed Server Hosting services
(Public Company; 501-1000 employees; Information Technology and Services industry)
July 1998 — November 2000 (2 years 5 months)
Built up Customer Engineering staff from 12 to 30 approving hires with mid-level managers.
Redesigned operational processes from point-of-sale to customer premise equipment (CPE) installation lowering installation times by two weeks; a 45% increase in productivity.
Restructured nationwide order management systems and installation procedures which directly resulted in over $3.5 million increase in 2000 revenues.
Redesigned nationwide customer service procedures used by 150 Helpdesk employees that provided a higher level of customer service, shortened call times, and reduced employee turnover.
Managed 30 engineering employees who installed leased lines (ISDN, DSL, Frame-Relay, T1, T3, OC-3) at seven offices in the Northeast and Central regions and performed country-wide upgrades to Cisco IOS on CPE routers
Identified new technologies and provided sales engineering support to define network solutions for customer high-speed connections
BS, Summa Cum Laude , Information Technology , 2003 — 2004
Graduated Summa Cum Laude
Computer Science 1992 — 1997