#solo2010: The Green Chain Reaction; please get involved!

More info on the session we (Simon Hodson from JISC and I) are planning for Science Online 2010 in London on September 4. We have had lots of discussions with the organizers and sponsors and we are intending to do something exciting, novel, important and certainly unpredictable.

The title, “Chain Reaction”, was suggested by Allan Sudlow, one of the organizers from the British library and I have added the suffix “Green – see below.

What we want to do is to have a global interactive adventure, hopefully with work being done beforehand in the blogosphere where we – as a world community - carry out data-driven science. The current working title is

“Are chemical reactions becoming greener?”


There is lots of information in the published literature and the unpublished literature on chemical synthesis. There are several million chemical syntheses published each year either in the primary literature, theses, or patents. These are normally reported with chemical diagrams and a paragraph recording what was done.

Henry Rzepa suggested that the theme of this event should be greenness . This doesn’t mean that the reaction actually looks green to the eye but that it is more friendly to the environment (wastes less material, causes fewer problems with toxic or environment-unfriendly chemicals). Green chemistry is described here: ( )

And there is a strong push for both industrial processes and academic chemistry to be green.

The question is:

“Does the literature show that chemists are using greener reactions than previously?”.

I’ll be laying out how we might tackle this and emphasize that we want everyone to take part. The main challenge will be organizing information and we welcome people who want to carry out data-driven research in the Open.

The only restriction is that the data we use must be Open according to the OKDefinition ( ). At present almost all databases of chemical reactions are not Open/Libre (and most are not even Gratis – you have to pay for the information). So it will have to be through text-mining from publications.

Here again we are restricted. Currently the Open material we have is:

* Jean-Claude Bradley’s (Drexel, Philadelphia) pioneering work on OpenNotebook Science where he and his group publishe all syntheses to the web as they are collected.
* Mat Todd (Sydney) who has pioneered open Drug Discovery and where he will be making syntheses and these available
* Cambridge (where we have created semantic theses from the originals)
* Acta Crystallographica E with about 8000 preparations. These are all Open Access /Libre (CC-BY) and we have the active involvement of IUCr.
* BioMed Central, PLoS and Beilstein Journal of Organic Chemistry. These are all Libre/CC-BY publishers and we are already in contact with them.
* The Open Subset of PubMed Central (and especially UKPubMedCentral)
* European patents. Ca 60 per week, maybe 1000-5000 syntheses per week.

Text-mining is not 100% recall or precision but the noise should be small in identifying key things like temperature, solvent, catalyst and time.

We have tools that will allow you to download and mine patents and this would be an excellent adventure for early adopters. I’ll write more, but handling alpha-beta code is more important than knowing chemistry and we’d love volunteers.

We’d also like to invite mainstream chemistry publishers to take part. Traditionally they have not allowed text-mining of their material, but times are changing and this project is reaching out to them to see if they’d like to show the value of text-mining for chemistry. So we’ll be setting up a scheme (using the Open Knowledge Foundation’s IsItOpen service) to formally request permission to text mine experimental data and release it as Open Data. We are sure that many will wish to contribute towards green activities and we’ll record those with positive response at the meeting.

As you can we are developing this as we go. I know August is a bad time of year but this is a great activity while you are on vacation. It would be great to show a blogospheric and publisher response for the September meeting.

We hope that both Blue obelisk and OKF adherents will take part. We want this to be completely open so it needs to use Open Data, Open Source Open Standards and Open Services. Please jump in…

I hope to blog every day on this topic.

