Running Patent Analysis

What is going on here?

What you are going to do is download a small program that runs in Java. You almost certainly have java installed on your computer if you have a web browser. The program reads an instruction file which tells it how to read through a list of patents that relate to chemistry. You will also need to download these two files and instructions are given below.

Why would I want to do this?

This project is attempting to ask a question by getting computers to "read" as many patents as possible from the recent to the quite old. The question we are asking is "Is chemistry becoming more green in the processes and reagents that it uses?" To do this work we are asking volunteers to become involved by contributing their computing resources to help read the patents. No knowledge of chemistry is necessary!

More generally we are trying to demonstrate the feasibility of collecting information from across a wide range of documents that relate to science to ask wider questions. The results of this work will be presented at Science Online London 2010 in a few weeks time.

Sounds great! How do I do it?

Prerequisites: Java

Instructions for analysing patents:

Latest instructions for the experienced

  1. please always use the code from Hudson
  2. Download latest jar from$patent-analysis/patent-analysis-0.0.1-jar-with-dependencies.jar which have been lightly tested.
  3. Create a folder named e.g. patentData where the index is and where the results will come
  4. Download intoanywhere convenient - yourDir
  5. Download to anywhere convenient - yourDir
  6. Download a random patent catalogue (though pre-1990 may be lacking Chemistry patents) from into the patentData folder
  7. run "java -Xmx512m -jar patent-analysis-0.0.1-jar-with-dependencies.jar -p <yourDir>/parsePatent.xml -d <patentData>"
  8. Then run "java -Xmx512m -jar patent-analysis-0.0.1-jar-with-dependencies.jar -p <yourDir>/uploadWeek.xml -d <patentData>"

to upload the results.


More detailed instructions for the less confident volunteer (but check filenames against those above)

  1. Downloading the software tools and creating a working directory
    1. Open a browser and paste the following link into your address bar: A download should start automatically. It might take a little while (around 40 seconds for me).
    2. Once you've downloaded the zip file, find it (your browser should help you with this) and unzip it. In most cases, double clicking, or right-clicking and selecting "Unzip" or something similar should do the job.
    3. Check that you have three files in the unzipped folder, they should be called "parsePatent.xml", "uploadSolvent.xml", and "patent-analysis-0.0.1-with-dependencies.jar"
    4. Drag the folder to somewhere convenient, like the desktop or your documents folder
  2. Second step - getting a patent index
    1. Point your browser at This takes you to the main index.
    2. You can select any year. Probably not much point going for ones much before 1990.
    3. Then select an index. Probably easiest to right click (or click-hold on a Mac) and choose "Save target as…" Save the folder into the directory with the tools that you just put somewhere where you can remember it. Now you are reading to…
  3. Do the analysis!
    1. Open a terminal window.
      1. Windows: In Start Menu select "Run" and type "cmd" and press return
      2. Mac: Open "Terminal" from Applications -> Utilities
    2. Navigate to your directory.
      1. On Windows or Windows if the directory is on the desktop try "cd Desktop/patentData"
    3. In the terminal type the command "java -Xmx512m -jar patent-analysis-0.0.1-jar-with-dependencies.jar parsePatent.xml"
    4. This should then run the extraction. Sit back and enjoy the nice warm feeling. The analysis will take between 10 and 60 minutes depending on how many patents are in the index.
    5. When the program has finished running you are ready to upload the results. At the command prompt type "java -jar patent-analysis-0.0.1-jar-with-dependencies.jar uploadSolvent.xml"
  4. All done! You can now go back to Step 2, pick a different patent index and start again…(you might want to delete all the folders and files that have been created first just to keep things clear and tidy)
