Whats Eating Scientific Data? 21st Century Approaches to

Google Tech Talk June 17, 2009 ABSTRACT Whats eating Scientific Data? 21st Century Approaches to Discovering (Chemical) Data Presented by Jim Downing and Nico Adams. The web of documents and unstructured information is slowly but inexorably evolving towards a web of data. The increasing data-centricity of the web is driven by the next generation of web-applications and the future evolution of search - the searching of structured data is the value proposition behind a recent spate of start-ups in the search space. Furthermore, the internet in general and the semantic web in particular are revolutionising the way in which science communicates, manages and exchanges data, impacting all areas of scientific endeavour from scholarly communication through to laboratory management and data analysis and mining. Chemistry is the central physical science and at the heart of modern research into new drugs, new materials and new personal care products. All of these products require the confluence of structured data from a number of different domains and often advances in science can be viewed as a data integration problem and therefore the availability as well as the discoverability of high-quality scientific/chemical data on the internet is of the utmost importance. In this talk we will discuss recent developments in the semantic toolstack for chemistry, starting with markup languages for chemical data, RDF vocabularies as well as ontologies (ChemAxiom) for chemicals and materials (data). It will illustrate how ontologies can be used for indexing, faceted search and retrieval of chemical information and for the "axiomatisation" of chemical entities and materials beyond simple notions of chemical structure. We will discuss the use of linked data to generate new chemical insights and will provide a brief discussion of the use of our entity extraction and natural language processing system OSCAR for the "semantification" of chemical information. We will demonstrate the use of authoring tools (Chem4Word) for the generation of structured "datuments" (data + documents) on the web as well as the Lensfield data processing and publication system. There will also be a brief discussion on how some of the principles developed for chemistry can be applied to other domains, such as biomedical research. Finally, we will review some of the challenges that are facing both chemical data and the adoption of semantic web technologies today. Biosketch Nico Adams: Nico Adams read chemistry the University of York and subsequently worked as a research chemist at DSM Research (The Netherlands) and Cambridge Combinatorial (now Millenium Pharmaceuticals, UK), on the combinatorial synthesis and screening of early transition metal olefin polymerisation catalysts. He subsequently became a member of the group of Prof P. Mountford at the Inorganic Chemistry Laboratory, University of Oxford to read towards his doctoral degree in organometallic chemistry. In 2003 he joined the Technische Universiteit Eindhoven as a post-doctoral research associate (group of Prof U. S. Schubert) and the Dutch Polymer Institute (DPI) as a project leader in polymer informatics. In 2006 he joined the University of Cambridge as a research associate, where he manages a research group in polymer informatics. His main research interests lie in the area of combinatorial and solid phase organometallic chemistry, materials and polymer informatics, the use of polymers for biomedical applications as well as ontological engineering and the semantic web. Biosketch Jim Downing: After completing a Masters in computational fluids and mechanics, Jim spent 4 years with a small software start-up in Cambridge working on information and knowledge systems in science and engineering research, and later in public sector information. He moved to the University of Cambridge in 2004 to work on the Open Source DSpace institutional repository software. Working with early adopters of the DSpace system at Cambridge (particularly Prof. Peter Murray-Rust) led to an interest in chemical information, and to Jim joining Prof. Murray-Rust's group to develop software architectures for chemical information, including a move towards semantic web technologies and RESTful web APIs. Jim is currently interested in the application of Linked Data in chemistry and the opportunities and challenges presented by functional programming languages in cheminformatics.
