Data infrastructure: automated collection and analysis, Ruby on Rails
San Francisco Bay Area
Data infrastructure: automated collection and analysis, Ruby on Rails
San Francisco Bay Area
Over ten years of experience in software development, working for publicly traded firms and privately held startups. Experience in Web development, data collection and analysis, anti-spam, and distributed computing, working directly with customers and users to make their jobs easier.
Ruby on Rails, data collection, data analysis, anti-spam, e-mail
(Privately Held; Internet industry)
September 2008 — October 2009 (1 year 2 months)
Ruby on Rails developer on a Scrum team that built and shipped a sales tool for event ad campaigns, on time and meeting evolving requirements.
Contributed to sprint planning, developed RSpec criteria, wrote model, view, and controller code, and deployed releases to production.
Wrote content management and administrative tools for managing the duplicate status of records.
Enabled publication of large blocks of data by moving processing from Web servers to a batch cluster.
(Privately Held; Internet industry)
November 2006 — September 2008 (1 year 11 months)
Designed and implemented system using the Ruby on Rails framework for automated content QA and correction.
Refactored content insertion system to handle the orderly incorporation of data from dozens of sources.
Wrote unit, functional, and integration tests for content insertion, content QA, and content management systems.
Developed tools to find and automatically merge duplicate data, based on similarity.
(Privately Held; Computer Software industry)
January 2005 — November 2006 (1 year 11 months)
Designed and implemented new classifier features and improvements to data collection and feature selection process.
Generated feature lists to support multiple languages and character sets as part of an internationalization project.
Model updates combined with rapid response to emerging spam campaigns increased classifier effectiveness and accuracy.
(Privately Held; Computer Software industry)
January 2004 — January 2005 (1 year 1 month)
Data collection and updates of Proofpoint's anti-spam classifier models.
Wrote text and regular expression model features based on emerging spam attacks.
Collected and analyzed spam messages for update process.
Worked with customers to stop spam campaigns.
(Public Company; INCY; Biotechnology industry)
July 2001 — November 2002 (1 year 5 months)
Managed Unix and Linux accounts for over 100 users in a production bioinformatics, dataflow, and data analysis environment.
Administered production Unix and Linux machines.
Configured Platform LSF distributed computing cluster of over 1,000 machines. Wrote Perl scripts to report cluster status.
(Public Company; INCY; Biotechnology industry)
June 1999 — July 2001 (2 years 2 months)
Programmer on Linux farm project, which was both used in-house and sold to outside customers. Co-designed, wrote, documented, and maintained batch job distribution software, as well as a parallel application launcher, for several clusters of rack-mounted, commodity PCs, controlling over 2,000 processors in total.
The clustering project saved Incyte several million dollars in hardware costs by moving most data processing from expensive servers to much cheaper commodity hardware.
(Public Company; INCY; Biotechnology industry)
November 1997 — June 1999 (1 year 8 months)
Ran dataflow for Incyte's plant database, defining schedules between sequencing, product science, and legal departments. Screened and annotated several hundred thousand sequences, analyzing and preparing them for release, meeting an aggressive schedule.
Wrote wrapper scripts to automate dataflow.
Benchmarked different hardware platforms to evaluate cost/performance. This work led to the Linux farm project.
(Non-Profit; Biotechnology industry)
June 1994 — August 1997 (3 years 3 months)
Annotated and curated Genome Sequence DataBase (GSDB), a public DNA sequence database. GSDB was a major project of the non-profit National Center for Genome Resources (NCGR).
Served as primary contact for four large genome centers, processing large and multiple sequence submissions into the relational database.
Tested and evaluated GSDB Annotator software.
(Government Agency; Research industry)
August 1993 — June 1994 (11 months)
Assisted offsite users with DNA sequence file format, answering detailed questions about data insertion.
Assured data quality by checking DNA sequence submissions for correct syntax and accurate biological information.
B.A. , Liberal Arts , 1991 — 1993
General liberal arts program, focused on math, laboratory science, philosophy, literature, Ancient Greek, and French.
Liberal Arts 1989 — 1991
programming, Ruby on Rails, Ruby, Perl, Linux, Agile, Scrum, distributed computing, test-driven development, behavior-driven development, events, search, startups, bioinformatics, genome, genomics, machine learning, anti-spam, Getting Things Done, GTD, environment, sustainability, ecological footprint, renewable energy, bicycling, ethics, free speech, free culture