Bulk Patent File Histories
The PTO now maintains all of its patent applications in electronic format. I'm doing some empirical work and would like to create a database of all recent US application transaction histories. You can do this one at a time through the PTO's PAIR system, but I'm looking for a large amount of data. Any suggestions?
Answers (2)
Hi Dennis:
Unfortunately, the USPTO does not appear to be offering file history related data as a data product that can be purchased in bulk at this time.
One option is to write software that scrapes the information from PAIR. In private PAIR, this can be done quite easily using the XML download feature, since it is trivial to import XML data into a database. Of course, similar software could be written to work with public PAIR, but it requires a few more steps and is more likely to “break” if the USPTO makes changes to PAIR. A competent computer programmer could code a simple scraper program in a few hours time.
The terms of service agreement for private PAIR (which I reviewed only briefly) do not appear to expressly prohibit use of scraper software. However, that agreement does note that a PKI certificate may be revoked "for any other reason the USPTO deems necessary." In other contexts, scraper software appears to be tolerated by the USPTO (for example, www.pat2pdf.org is essentially a scraper application).
Even if it is permissible to use scraper software with PAIR, I would recommend that it be coded to minimize the impact on the USPTO's servers. This can be done by sending only one request at a time and, optionally, by implementing a delay between requests.
It is almost comical to say this, but you might want to consider a clearance search first. See, for example, Method and apparatus for scraping information from a website, U.S. Patent Application Publication No. 20060095377.
Regards,
Craig
Links:
John W. C
Partner in Major Patent and Trademark Law Firm
Best Answers in: Intellectual Property (6), Business Development (1), Biotech (1)
Dennis: Have you actually asked the PTO for the raw data?
Probably easier to program ab initio than to get the data via PAIR on line. Less intrusive also.
I will be happy to support your request.