Yes. Some of that data is interpretable without any external information, but its value is hugely enhanced if we can use it as a way of linking on the one hand things that we already know about physiological systems or developmental systems or diseases or particular tissue cell types, and on the other hand whatever we already know about the genes or their products: the links that are provided by those millions of data points in the gene expression arrays add a huge amount of value to existing information.
So every time we did an experiment my colleagues and I found ourselves digging up hundreds of papers to educate ourselves about things that we were seeing in the data. By that time a substantial fraction of the journals that were most relevant to what we were doing were being published online and since we were at Stanford we had access to at least most of them, but we knew we were not getting full value out of our data, first because we still didn't have ready access to all the available knowledge that was published somewhere and would enable us to make better sense of it. Equally importantly, the manual search for published information that could add value to our data was unscalable. So we wanted to put the entire corpus of relevant articles in a database that would let us automate the process, but we were thwarted by publishers who strictly forbade downloading and automated analysis.
That led me to ask why should publishers be able to control what I can do with information that was published by my scientific colleagues whose motivation was exactly to have their discoveries contribute to future discoveries? And it became obvious that there were things about the way the scientific literature was organized that were anachronistic in 1997 when we had already existing tools that we could use to so to speak hyperlink things so that you could reorganize information in systematic ways, but they weren't really being exploited by the conventional scientific literature.
Then at about that time my lab had published a paper that involved a lot of supplementary information that we posted online the main server at Stanford, which was used by maybe a hundred scientists at Stanford, and more than half of all the bandwith of the server was taken up by people downloading our data. And so I thought OK well actually we should just stop publishing in journals altogether, and when we have something to report we'll just post it up on this server and spread the word and bypass the whole annoying experience of publishing in journals, and just let the world decide if what we have is interesting. Well that was kind of a primitive idea but...