INSERT: i Jim Hendler
"This is really the first time I don't cross my fingers behind my back when I say this is for real. The research side still needs to happen, but the fruits of what we're doing five years ago are out the door."
STING
INSERT: ii Tim Berners-Lee
"First rule is: don't change what you are doing!"
STING
INSERT: iii David De Roure
"The magic is to plan for it to be used in unanticipated ways. Because that's how new scientific inquiry in the future, will be to ask questions that just couldn't be asked before."
STING
INSERT: iv Reagan Moore
"The semantic Web and assumes that somebody is managing the data. So we're looking at the problem from just the opposite viewpoint."
So who is managing the data? How do you plan for things you didn't anticipate? And why you should do nothing more than - well at least uncross your fingers!
Semantics, Science and Grid Computing from the 15th International World Wide Web Conference in Edinburgh.
SIGNATURE TUNE
Hello.
STING
I'm Peter Croasdale. And welcome to this series of conference podcasts.
In the first one we whizzed through the key themes of the four days in Edinburgh. In this second one I thought we'd stop and look at three of those themes in a bit more detail. First on the list - well - it has to be the Semantic Web, and then - I thought Grid computing and Global Science to follow. What more could you want! - though don't answer that - it's a rehtorical question. Although, if you really feel you must - then you can always send me an email to podcasts @ brightinidigo.com.
INSERT: 1 Tim Berners-Lee
"One the pattern is, that we are creating an ontology for something like, a deep ontology, for example for businesses, or for diseases. So when you say that this particular disease is like this disease and it shows these symptoms and it is curable by this drug. Then this ontology is a huge ontology, it's an art to produce it, and it's actually, if you like, contains the knowledge itself in the ontology.
Now there are other sorts of ontologies, if I download my bank statements, or I look inside the database that we have had for ages at the consortium, about how we run the business, then that's got members of the working groups. And those are all stored in databases, and those database columns don't change very often. But to actually make that data useful what we have to do is look at that and say this databases about the working group members and, then actually, those are people. So make a little ontology that includes working group members.
Now the ontology of stuff we store in our database, there is only a few hundred columns maybe. And those changed relatively slowly, they're the sort of thing is that if you're going to standardise them -- it's easy to standardise, there's not a lot of argument about them. There are a lot of systems that have been running, like the bank statements and stuff. So I think, really, when we're talking about technologies, and how they are useful, then these taxonomies on one side, and these infrastructure ontologies, if you like, are very different."
SEQUE
INSERT: 2a Nigel Shadbolt
"This is going to be a collection of agreements, some more or less large, more or less deep, more or less broad."
Nigel Shadbolt, chair of the Semantic Web plenary, and Professor of Artificial Intelligence at the University of Southampton.
INSERT: 2b Nigel Shadbolt
"So that what you are looking for here are organised communities. Now take the example of engineers, all scientists, there already exists a bunch of scientists around the world who are really interested in sharing their information: life scientists, biologists.
They sit there with information and data are new experimental findings and they want to get that information out there not just as published papers but they may be you like to actually publish their core data. Maybe they would like to find out who else has got data like their's. And so the then we would make sense to get an agreement of what the core ideas are in some particular area biology, and we use the community will take a hold of that. And oh by the way, there's this great standard that the World Wide Web Consortium has given as, just like HTML, it's a standard for laying out web pages. And this is a way of telling us how to express those ideas in a notation that our machines can understand. And that we can put into content so that when we come up to a piece of content and say what does this mean, it says well this is the key concept of a protein for example.
And this exercise can happen many times and many different levels of grain size for just as many communities who feel they need to share ideas."
INSERT: 3 Eric Neumann
"It is a mix of people from the industry as well as academia who are or are identifying genome resources, pathway, a molecular pathway information, definitions of diseases, that are based on other ontologies, and building a strategy of how to convert these into an RDF model, there will be a demonstration, Web-based. And open offer other people to use and an onto. So that's is a first step. There are other activities in areas of being able to tag scientific content in papers. So that we take what was in the world of literature and publication into the Web and semantic Web world where everything can be cross-linked and definitions inside these articles can be made explicit. That would greatly accelerate how researchers can find things relevant to a disease topic to cures, and could be used by the public as well as the industry."
More on scientific publishing later. But first Tim Berners-Lee's own recipe for dipping your toe in to the Semantic Web. You'll be pleased to hear it includes lunch; a bit of SPARQL, the next layer up the cake; and the back of an envelope!
INSERT: 4 Tim Berners-Lee
"The first rule is: don't change what you're doing! Keep all those people at their jobs. Keep those databases running because the systems you put them place to get all that data there are valuable they took a long time to set up. So people have decided to do so in XML over here and databases there don't argue about that. However, a rival with a gentle touch and ask if you can have access to their database with a nondestructive program which will just instrument their database by providing a sparkle interface to it. Take them to lunch and ask them what actually all those columns mean, and sketched down a shell ontology onto the back of an envelope. And take the ontology and feed it into some software which will map the database into SPARQL. Write a little application in JavaScript or Python or something which does SPARQL queries across all that data. And then, show somebody who's got some clout within the company. Try it out! Go their bit by bit. The value of any piece of semantic data is the amount of the other semantic Web data out there. So if in your enterprise, just in your enterprise, you're looking for the return on investment. The return on investment for doing it just within your enterprise may be quite sufficient to justify the work. But it won't be anything like the effect you will have when you can join that data in your enterprise with: all the other data of your partners upstream and downstream; with the regulatory bodies; with all the public databases out there. That is when it will be really mind-boggling in the way that the Web is mind-boggling."
INSERT: 5a Eric Neumann
"There are times where it gets a little bit tricky because there are almost too many standards."
INSERT: 5b Eric Neumann
"In the clinical space there are cases where there are three or four different naming strategies around diseases and treatments. And they are used by different groups in different ways. Some are used more by the regulatory agencies. But if we can define not just one, but all of these, in some sort of ontology definitions it allows the ability for people to start to stitch them together.
Croasdale: Jim Hendler talked about islands of semantic Web sort of emerging. This seems to be like you've identified a cluster of small islands that you can start to build the bridges between.
Neumann: Oh very true. It's a large archipelago really. And there are many different places to connect them. Some bridges are already in the process of being made. Others it's going to take more time. But I think when people see the advantage. It's not just building bridges with the fun of it. It's when they start getting used and traversed. And people can work more effectively. Similar to the way the original Web took off in the early days. We want to see this, with these different islands, people start to participate, find new uses and interactions between the data, and the concepts and the applications."
INSERT: 6 Jim Hendler
"I've put up a slide that said "think of 106 as small". But one I really should have said is "think of 109 as routine, and start thinking about large". Because we are seeing, for example in the bio world, people who can generate 100 million triples a day. That's if you don't ask them through a large one. For a large one they will turn on the other machine. So I think the challenge of scaling was way more than we expected. And that's really where my biggest surprise has been, that technology came together, but we really need to keep pushing it way beyond where we even thought at the beginning."
INSERT: 7 Nigel Shadbolt
"Now trust in content, and believing that if you allow some content out there you can also associate with it rules of use that computers can actually implement. That's a research area. And we don't have all the answers there. There's some very interesting research in this whole area of the so-called policy away computing where the content carries its own rules of use, its own rules of what it will reveal to who, around with it. And then the question is can you guarantee that that content is revealing itself to the right people in the right occasions."
SEQUE
INSERT: 8a Jim Hendler
"As I use sets of data to do things, wouldn't it be nice if the policies of use are being checked."
INSERT: 8b Jim Hendler
"So, if I take data from place A and data from place B and send it to someone else: a; it would be nice if the person who gets my information knows it came from A and B. More importantly, if I promise the person who gave me A I would never combine it with B, it would be nice if a little pop-up window came up and said "you're violating an agreement, you can do it, but I'm going to tell A -- is that okay?" And if I hit the okay button...so again, I, the user am in charge, but the system can start reminding me when I'm doing things. The believe is we can start building some mechanisms for, who said what, why did they said it, where did they get it from. And that's the very top of the layer cake is really the sort of the trust level. So there's a lot of exciting work yet to be done, it's a very exciting place to be."
INSERT: 9 Nigel Shadbolt
"One of the things that we are actually involved in is a commercial spin out. Where a consortium of highly respected business individuals approached a number of us for know-how, and help, in how there might be a serious consumer 'play', as they would call it, in the area of digital, personal digital information management. So the idea is that, can you provide consumers with services that allow them to monitor, to perhaps cleanup, removed, hide, the wealth of information that is out there. To help them actually exercise their legal rights, of which you have many, but are often difficult to exercise because there's a whole bunch of complex processes to go through. So, the idea here is to use some of the semantic technologies that we've been talking about, and developing, to try and put in place a new kind of consumer service. That company is called "Garlik.com" with a 'K', "powerful stuff" is the strapline. And that will be going live as a UK service in late September."
INSERT: 10 Reagan Moore
"The semantic Web assumes that somebody is managing the data -- and therefore, the semantic web is a mechanism to help you discover data that has already been managed. So we are looking at the problem from just the opposite view point. We want to understand how to actually manage the data, where the data itself is distributed. Normally, when I have a semantic web application I think I'm going to the location where the data resides, and query enough ing that location, and then bring back something that I'm interested in. We have the problem where the data isn't and just one site, the data is at multiple sites in different administrative domains, in different institutions. And how do you build any coherent naming, coherent access mechanism, across all the places where the data might reside.
Croasdale: And what are these sorts of issues and hurdles that you have to get over in your daily job, really, of managing these huge chunks of data?
Moore: The management of data really is the same problem that all the groups building digital librarys have. They have to go through the curation process of correctly describing the data. You need validation processes to assert that "yes" we have the correct Meta data for each item that is registered. We also have the data management challenges of stating "are the actually two copies of the data?" "Is the second copy at the correct site?" "Have we validated the checksums such that we know that both copies are still good?"...etc. So you have to go into not only managing the metadata that the digital library traditionally worries about, but also trying to manage the distribution and integrity of the data, that is actually out there.
Croasdale: And I assumed that to some extent the world of science is changing as a consequence of the ability to manipulate, manage visualise these large integrated metadata sets. It's actually altering the way people do science?
Moore: What happens then is that instead earth trying to ask questions about warm object within the collection -- you ask questions about properties of the collection itself. So, in astronomy, a traditional approach would be to identify an unusual object, and then try and find images of that object from multiple sky surveys taken on multiple wavelengths of light. And then you build up of physical theory of what caused the unusual object, what is properties are etc. Instead of doing that on a single object, once you are able to manage a collection and manipulate all the images within a collection, then you want to ask the question I want to identify all objects like that in the universe."
INSERT: 11 David De Roure
"Where the grid computing meets the physical world is in the instruments that are in the chemistry laboratories and the people that are in the chemistry laboratories. So there we use pervasive computing in order to pick up all sorts of information in the lab, and then when this is processed and used on the grid and the scientists produce their academic output from that, their scholarly output, the data that was used to produce that is available. The experimental conditions in the lab, the environmental conditions in the lab, it is all still available. You can chase back from you or scholarly output, which now looks more like a web page than a piece of paper, to all the original data. All the way back to the devices, to the scales, to the tablet PC, to the thermometer -- everything that was going on in the lab -- to the x-ray diffractometer, to the robot that was placing the samples in the x-ray diffractometer, to the mass mass spectrometer its outputs. So it all gets linked up. And that's a good example of the physical world in the digital world coming together.
The other area where they come together is really going back to the semantic Web. What we did in chemistry could be viewed from a grid viewpoint as a semantic data grid. So a data grid is perhaps this idea of having an awful lot of data out there federated in some way. By taking our semantic Web approach, we've made that data very much more accessible, and reusable, and more interoperable. And it turned out when we actually did the "research in the wild" that the chemist like this approach because it were is more flexible than their previous approach of using relational databases. It is difficult to change the schema of a relational database, especially if you don't own the database. Whereas with the semantic Web approach their able to extend far more flexibly the schema that they were using, both to join databases together and to represent the data itself. So that's semantic data grid angle's come through spectacularly with the chemists. We have such a chunk of semantic Web in our chemistry labs now, I'd like to know how big, how much semantic Web there is on the planet, but this is a significant chunk of semantic Web."
INSERT: 12 Richard Smith
"The studies that I have seen suggest that for less money than is currently in the system it should be possible to make all of this research available for free. I mean the money comes partly from the very substantial profits of publishers, it also comes from the money that is currently disappearing in paper versions -- which I just don't think are necessary any more. So, I don't think it's utopian at all. I think that this should be win-win, in the sense that you can make the research available for free to everybody everywhere for less money than is currently spent on the system. But of course, there are going to be the losers -- the traditional publishers.
And one thing that's interesting, a lot of the resistance comes from scientific societies, because actually most of the world's major journals are owned, not by Reid Elsevier , but by scientific societies. So, the New England Journal of Medicine (NEJM) which is a leading journal is owned by the Massachusetts Medical Society; JAMA another major journal, is owned by the American Medical Association; the Annals of Internal Medicine another major journal, of is owned by the American College of Physicians; the BMJ, which I used to work for, is owned by the BMA. So actually most medical journals are not owned not by commercial publishers are by scientific, or medical societies. And they're the people who are often resisting it, rather strongly, because it's a very substantial source of income at the moment."
INSERT: 12 Richard Smith
"It will have a huge impact. I mean, something that I'm involved with is called the Global Trial Bank. Which is about making all of the data that are used in clinical trials, and clinical trials are very important in medicine and health care in determining what works and what doesn't work, the Global Trial Bank would create a kind of searchable database. So that actually you could extract all of the data from trials on a particular subject, say, you know, "should we use antibiotics when doing a caesarean section?" You wouldn't just find the individual papers as you might at the moment, but you can get access to the underlying data, and the computer would no that they were the data. So I think were already at the beginning of that -- but it's still the beginning, I think."
INSERT: 13 David De Roure
"The magic is to plan for it to be used in unanticipated ways, because that's how new scientific inquiry in the future will be to ask questions that just couldn't be asked before. One way of explaining all of this technology to the end user, who is the the E-researcher, is to say "these are the queries that you can do now, that you couldn't do before". And that's what was very exciting to them. And that's what really where I see the semantic Web and the grid technologies coming together. When we introduced the semantic Grid activities to the global Grid Forum a few years ago, our way of articulating that to the grid audience was to say "these are the queries you can ask". And that's really -- that the magic from an E-researchers viewpoint."
David De Roure, closing with his person view of the magical mystery tour the Semantic Web will eventually give us.
Two more podcasts to go. And in the next one - we'll hear more from the conference on the technical track, including the winning researchers from the event. Also, some of the W3C work that when on there .
This programme was produced for International World Wide Web Conference Committee by Bright Indigo.
To take a look at the programme notes - then check out the technology section of the Bright Indigo.com website.
But, until the next podcast – from me Peter Croasdale -
STING
Goodbye.
SIGNATURE TUNEOther resources related to this Podcast.