What would science look like if it were invented today – part II: knowledge structuring
09.30.09 by Daniel Mietchen
Editor’s Note: This is the second of two parts of a guest post for the Euroscientist, the blog of Euroscience.org. Part I can be found here. FundScience.org cross-posts this article, as well as forthcoming installments, because of our passion to promote open science and collaboration, not only between scientists, but between the scientific community and the public.
Part II: What would knowledge structuring look like if it were invented today
Science is already a wiki if you look at it a certain way. Its just a highly inefficient one — the incremental edits are made in papers instead of wikispace, and significant effort is expended to recapitulate existing knowledge in a paper in order to support the one to three new assertions made in any one paper. (John Wilbanks)
There are many ways to structure knowledge. One is via coordinated cellular activity in your brain. Others may involve spatial arrangements of sheets of paper or numeric arrangements of digital documents. Here, we will focus on the difference between the latter two, building on a previous outline.
Structuring scientific knowledge online
Let us first consider some practical aspects of organizing scientific knowledge in online environments:
- Newly incoming information can be inserted at any time later, independent of press runs some call this micropublication. For example, part I of this post has already been “published” on the blog, but its wiki version can still be updated with references that were not available at the time. This may not be relevant for blog posts, but consider it as a proof of principle for writings in general, including scholarly reviews on a topic.
- In sharp contrast to current practice in paper-based scholarly journals, online platforms like public wikis make the whole article openly accessible right form the start. Detailed explanations of keywords and key concepts can be linked from within the article, and the article itself can be put in linked context via a variety of mechanisms, e.g. categories (Wikipedia), Related Articles (Citizendium), or links to other ontological frameworks (like MeSH terms).
- Documents can be edited simultaneously by multiple authors. Google docs have been doing this for years, Etherpad improves on it, and Google Wave (scheduled to be released later this week) is going to have truly realtime simultaneous editing as well; while wikis are currently somewhat limited in this regard.
- Suitably designed schemes for identifying authors, their individual contributions, and versions of the whole document provide for bug tracking, permanent availability of text (or code) snippets, and attribution. For example, Wikibu tells you that at this point, the users RokerHRO, Proxima, Saperaud, 87.122.87.61, and BirgitLachner have been the main contributors to the Aggregatzustand (state of matter) entry in the German Wikipedia, and the individual contributions to the blog post you are reading can be viewed via its version history, embedded here:
These are certainly not all the aspects of online environments relevant to science (for instance, we have left out data management issues) but let us contend with these four for the moment and consider what their combination implies for the structuring of knowledge:
It is technically possible that all researchers currently investigating a given topic could coordinate their efforts by collaboratively creating, editing, and maintaining a central set of interlinked knowledge elements (be these wiki articles, knols, or other structures) that explain what is known about their topic in detail and embed it in a wider context.
As implied by the introductory quote, it is probably fair to say that this could make research on that particular topic (as well as teaching and outreach) much more efficient. Just imagine you had a time slider and could watch the history of research on general relativity, plate tectonics, self-replication, or cell division unfold from the earliest ideas of their earliest proponents (and opponents) onwards up to you, your colleagues, and those with whom you compete for grants. So why don’t we do it?
Structuring scientific knowledge on paper
Traditionally, given the scope of a particular journal, knowledge about specialist terms (which may describe completely non-congruent concepts in different fields), methodologies, notations, mainstream opinions, trends, or major controversies could reasonably be expected to be widespread amongst the audience, which reduced the need to redundantly say and then repeat the same things all over again and again (in cross-disciplinary environments, there is a higher demand for proper disambiguation of the various meanings of a term). Nonetheless, redundancy is still quite visible in journal articles, especially in the introduction, methods, and discussion sections and the abstracts, often in a way characteristic of the authors (such that services like eTBLAST and JANE can make qualified guesses on authors of a particular piece of text, with good results if some of the authors have a lot of papers in the respective database, mainly PubMed, and if they have not changed their individual research scope too often in between).
Of course, there would be side effects: A manuscript well-adapted to the scope of one particular journal is often not very intelligible to someone outside its intended audience, which hampers cross-fertilization with other research fields (we will get back to this below). When using paper as the sole medium of communication there is not much to be done about this limitation. Indeed, we have become so used to it that some do not perceive it as a limitation at all. Similar thoughts apply to manuscript formatting. However, the times when paper alone reigned over scholarly communication have certainly passed, as discussed in part I. The relative merits of paper-based and wiki-based scholarly communication are covered in more detail at a dedicated Wikiversity page.
Cross-field fertilization is crucial with respect to interdisciplinary research projects, digital libraries and multi-journal (or indeed cross-disciplinary) bibliographic search engines (e.g. Google Scholar), since these dramatically increase the likelihood of, say, a biologist stumbling upon a not primarily biological source relevant to her research (think shape quantification or growth curves, for instance). What options do we have to systematically integrate such cross-disciplinary hidden treasures with the traditional intra-disciplinary background knowledge and with new insights resulting from research?
As a sidenote, lack of context is also a consistent feature of most “Facebooks for scientists” in fact, the whole set of scholarly pages on the web is the appropriate network for researchers but so far it is not optimally connected, particularly because formal scholarly communication has not yet fully hatched from the structures from the paper-based era (see also this nice overview of the current situation). If it had, this would shift the focus away from periodicals (and, in passing, render things like a journal’s scope and Journal Impact Factor superfluous; see part I), which is likely to meet resistance from the publishing establishment. Yet, authors might just act on their needs by moving their “content” to grow in better production and exchange surroundings like the ones discussed here. Without good authors, no established publisher will be able to keep their grip on anyone’s research habits and thinking.
Wikis as an example of public knowledge environments online
Groupware comes to mind in this regard, and wikis in particular (another example would be collaborativey edited mindmaps, like the one embedded above that represents the topics covered by this blog post series): They allow us to aggregate and inter-link diverse sets of knowledge in an online-accessible manner, basically for free. The by now classical example is Wikipedia, and one scientific journal RNA biology has already announced that it requires an introductory Wikipedia article for papers it is to publish on RNA families, an idea that recently spurred a debate on the merits of such an initiative and of doing it with Wikipedia where basically anyone can edit any page, regardless of subject matter expertise.
An investigation (video lecture by Bill Wedemeyer here, a brief annotation here) of the quality of a set of science articles in the English Wikipedia is currently being written up for classical paper-style publication but the preliminary results indicate that “[t]here is a subset of reliably helpful science articles on the English Wikipedia for outreach, teacher training, and general science education” (slide shown at 29:35min in the video). However, the distribution of the set of articles was skewed towards the Good Article and Featured Article classes, which constituted only 2% of the English Wikipedia at the time of investigation, and it did not include articles in the humanities (scheduled to come next). Further information on academic studies about Wikipedia is available via these two Wikipedia pages.
The larger Wikipedias have a serious problem with vandalism: take an article of your choice and look in its history page for reverts – most of them will be about changes like this or worse. This is less of an issue with more popular topics for which large numbers of volunteers may be available to correct “spammy” entries but it is probably fair to assume that most researchers value their time too much to spend it on repeatedly correcting such information if it had already been correctly entered once. Other problems with covering scientific topics at Wikipedia include the notability criteria which have to be fulfilled to avoid an article being deleted, and the rejection of “original research” in the sense of not having been peer reviewed before publication. Peer review is indeed an important aspect of scholarly communication, as it paves the way towards the reproducibility that forms one of the foundations of modern science. Yet we know of no compelling reason to believe that it works better before than after publication (doing it beforehand was just a practical decision in times when journal space was measured in paper pages).
Fortunately, the Wikipedias are not the only wikis around, and amongst the more scholarly inclined alternatives, there are even a number of wiki-based journals, though usually with a very narrow scope and/or a low number of articles. On the contrary, Scholarpedia (which has classical peer review and an ISSN and may thus be counted as a wiki journal, too), OpenWetWare, Citizendium and the Wikiversities are cross-disciplinary and structured (and of a size, for the moment) such that vandalism and notability are not really a problem. With minor exceptions, real names are required at the first three, and anybody can contribute to entries about anything, particularly in their fields of expertise. None of these is even close to providing the vast amount of context existing in the English Wikipedia but the difference is much less dramatic if the latter were broken down to scholarly useful content, as discussed above. Out of these four wikis, only OpenWetWare and some Wikiversities (here counted as one) currently allow for original research to be published on their site in the case of OpenWetWare, this is indeed the main purpose. Furthermore, a number of more specialized scholarly wikis exist (e.g. WikiGenes, the Encyclopedia of Earth, the Encyclopedia of the Cosmos, or the Dispersive PDE Wiki) which can teach us about the usefulness of wikis within specific academic fields.
We will not dwell on any details here, but since new suggestions about combining elements of wiki and scholarly environments keep coming in, e.g. in the form of a Wikipedia journal, we will list a number of features we deem desirable for future scholarly wikis, derived from experience with existing ones. These include, in no particular order:
- Some system of peer review (basically, any wiki allows comments, annotations or formal reviews on talk pages of users or articles but these ratings should be featured more prominently; templates like those visualizing article status at Citizendium may help with that); this may be as simple as disallowing individuals to add information to Citizendium when the only available support is their own non-reviewed research published at OpenWetWare the real name policy will minimize misuse
- Uploadability of all kinds of media that traditionally (if you can call a habit that barely is a decade old a tradition already) went along with paper-based publications as “supporting online information” (which would be easily integrated in an all-online non-printable article with no sharp space limitations).
- Stable versions for content that has undergone peer review (like the Approved Articles at Citizendium, or the results of the double phase review model at the OA journal ACPD/ACP), along with draft versions for anything else (including improvements to and updates of previous stable versions); like any non-protected page at the Wikipedias, these draft versions can serve as a playground, though a real-name policy would probably make it a more educational one
- Search engines that integrate or otherwise compare favourably with major scholarly search engines on the web (the already mentioned Google Scholar and PubMed as well as, say, the BioText Search Engine that searches Open Access text and images), also in terms of the updating frequency.
- pan-disciplinary scope, with consistent disambiguation of specialist terms (mainly but not fully achieved at Citizendium)
- Separate namespaces for references (already in use at the Dispersive PDE Wiki and the French Wikipedia, in test at Citizendium); as a side line, this would open up ways for new citation metrics, via the What links here function
- Separate namespaces for original research. Encyclopedic endeavours need expert input. This is most likely to be achievable if the encyclopedic activites can be integrated with the experts’ workflow, e.g. via platforms like OpenWetWare.
- Attributability of contributions (automatically realized, though not in the traditional scholarly way, in any wiki with a real name policy like that at Citizendium, via the User contributions function; special arrangements exist at Scholarpedia and WikiGenes; OpenWetWare does allow nicknames but real names prevail; the Wikiversities have basically the same user name policy as the Wikipedias)
- Easy download of selected sets of pages for local archiving or analysis.
- Licenses that allow unrestricted reuse and derivative work if the original source is properly acknowledged (typically CC-by-SA or the older GFDL, both of which have been made compatible now)
- Resource-effective design (see also discussions on the energy use of the internet and individual websites). This overview may also help in working out an ecological footprint scheme applicable to research, as described previously.
- integration with the non-scholarly world (certainly achieved in the Wikipedias and Citizendium), particularly with students (cf. the Eduzendium initiative at Citizendium) and non-English contents
- Automation of the formatting, as already common in non-wiki environments, e.g. with LaTeX templates, for which collaborative editing environments exist too. None of the wikis we know comes close to that, albeit templates are heavily used at the various Wikipedias and, to a lesser extent but in a more consistent manner, at Citizendium; they seem to be rather rarely used on smaller or more specialized wikis. The same applies to references, though automated wikification has already progressed considerably here, despite the lack of wiki export functions at publisher’s sites (or of suitable XML-to-wiki converters for those who provide XML)
- Integration with mind maps (which structure knowledge) and databases (which harbour bits of knowledge that are hard to interpret without a broader context).
One of the most useful templates in use at Citizendium is that for subpages (open the Biology article in a separate window to see what this is about) :
- The article’s main page is a stable version, approved by an author with expertise in that field
- Next comes the Talk tab that leads to the discussion page, as per default in any wiki
- The Draft tab leads to the editable version (this only applies for articles that have already been approved; in others, the main page is editable)
- The Related Articles tab roughly corresponds to “see also” in the Wikipedias but is more usefully structured for navigation and somewhat replaces the categories which are heavily used in Wikipedia but only to a limited extent at Citizendium
It is interesting to see that these and other individual subpages largely complement existing social networking tools and have thus the potential to replace them (or to be replaced by them), at least for scholarly purposes:
- The Bibliography subpage is a context-based alternative to CiteULike, Zotero, BibSonomy and other reference managers, possibly in conjunction with Open Library, scholarly search engines and tools like Scribd, Mendeley or Papers. One problem wikis cannot solve is that of access to paper-based research publications, but due to the current spread of Green and Gold Open Access initiatives, this is likely to change in the next few years anyway if authors decide to follow suit in a consistent manner and act accordingly for their own contributions.
- The External Links subpage is a context-based alternative to conventional social bookmarking as known from delicious and simpy
- Additional subpages could be tailored to meet the needs of individual categories of articles (e.g., properties of chemical elements, genes, stellar constellations etc.) or more general scholarly needs (e.g., peer review, slides, code, protocols, or bot-generated transcripts from video lectures)
Besides, User pages may provide context-based alternatives to individual pages at different networking sites, and possibly even to blogs like this one, while the Recent changes page could turn into an alternative for friendfeed, with items on your Watchlist (if you are logged in) equivalent to friendfeed rooms or personal feeds you are subscribed to.
For the record, this social networking component of Citizendium has already been discussed three years ago, prior to its official launch and thus at a time when many of its current structures and their implications were not known yet.
Finally, and importantly, the easy availability of context (once the system is reasonably well adopted by scholarly communities, and the encyclopedic corpus thus reasonably complete) would make it more easy to guide expert attention and thus to identify obvious gaps in current knowledge (e.g., by means of an expert evaluation of items listed on the Most Wanted page). Science funders (or indeed anyone) could then put forward research proposals on such topics (e.g., via a Calls subpage, FundScience, InnoCentive, Mechanical Turk or by more traditional means). And while we are at it, we think science funders, job committees and review panels would profit from familiarizing themselves with the workings of collaborative platforms like wikis, particularly the aspects relevant to reliability, attribution, and outreach. Your organization, company, university, research subject or methodology probably has a page on some of the wikis described here take a look at it, along with its history and talk pages, and you will almost certainly find something that needs improvement.
To sum up, the still fledgling Citizendium currently seems to be the closest match for a cross-disciplinary scholarly wiki anchored in the real world, and independent of whether it will allow original research to be posted in the future or not, this essential function in scholarly communication can be fulfilled by OpenWetWare (indeed, a similar separation of powers is one of the most healthy elements of most democracies). If widely adopted, this would entail a major shift in the way research is being done and communicated, towards what has come to be known as open science. As a side effect, commercial publishers would have to look for new things to publish, other than original research (non-commercial publishers like scholarly societies may, after the usual period of resistance, see more advantages than disadvantages in the groupware model). Reviews at different levels of expertise may be one option, also tutorials or other learning tools. All of this could be undertaken via some intelligently structured sets of groupware, too, depending on the incentives involved (in fact, such reviews are the scope of Scholarpedia). A side effect for researchers would be that they could use the author fees, page and figure charges and all other sums currently spent for publishing a paper for other purposes, including the maintenance of the shared public knowledge environments of the kind described here.
Of course, there are potential problems with such an enormous concentration of knowledge (e.g. for attacks and misuse, especially in relation to an international author identification that is currently being discussed). The obvious solutions are appropriate mirroring and otherwise transparency. Similar concerns would apply to a journal like PLoS ONE that does not have a scope in the traditional paper-limited sense mentioned above, yet one year after launch, it is doing pretty well. If it were to adopt a symbiosis with a suitable wiki in a way similar to the RNA Biology initiative which requires authors to submit “a short manuscript, a high quality Stockholm alignment and at least one Wikipedia article” (emphasis added) it might do even better. The first steps in this direction have already been taken.
This blog post was written and structured collaboratively by Daniel Mietchen, Claudia Koltzenburg and Franηois Dongier, with further input received via the FriendFeed thread embedded below. As you can infer from the mindmap, the originally two-part series is now going to be continued, and as always, you are warmly invited to join the drafting of the next part, which will deal with the implications of the paper-to-digital transition for research funding.
This text and the associated mindmap are available under a CC-BY license.
Science publishing on the fast lane, plus optionally in journals
08.30.09 by Daniel Mietchen
About two weeks ago, PLoS, Google Knol and NCBI announced a potentially groundbreaking collaboration: PLoS Currents a new platform within Knol and mirrored at NCBI allows for rapid submission of research results to the eyes of the public prior to, or possibly instead of, formal publication.
The idea is not new arxiv.org has been operating a preprint repository for almost two decades in the TeX-based sciences, and Nature Precedings for about two years in the remaining scientific fields. What is new here is the combination of preprints with an encyclopedia Knol and its embedding in the framework of a larger repository Rapid Research Notes, operated by NCBI, the computational arm of the NIH which is open for other publishers to join if they so wish. This way, a systematic record-keeping of information that has traditionally been transmitted only via meetings and conferences is now on the horizon.
The initiative is timely Knol was launched last summer but the laudations around its first anniversary could easily be mistaken for obituaries (mainly because it failed to rival Wikipedia the way many had expected), while experiments with the coupling of Wikipedia contributions and formal publication have now been going on for more than half a year at the journal RNA Biology. It is also timely because indications accrue that the current scientific publishing landscape might change dramatically soon.
People at PLoS do not seem to embrace the role of merely observing these developments they prefer to help them take shape. From this perspective, it is not surprising that PLoS would venture into wiki-like waters, though Knol is certainly not the only option for such activities.
Specifically, data do not currently play a prominent role at Knol (though integration with Google Wave might dramatically change this), but they do so at OpenWetWare, a wiki platform where researchers keep their electronic lab notebooks online and in public. Wouldn’t it be natural to write up “Rapid Research News” in the environment which hosts the data, and to link there from any scientific or popular article on the subject? Both of these platforms lack, however, integration between the individual research contributions and with existing knowledge: The introductory sections of the initial PLoS Currents collection on Influenza A H1N1 basically all link to different (and usually not updatable) references instead of one introductory article on the topic that could easily be drafted, hosted and updated in the same environment, as other wikis especially the Wikipedias have shown.
In that respect, it is perhaps of note that scholarly wiki environments already exist which grow their articles in context. Scholarpedia, for instance (a wiki with anonymously peer-reviewed articles and an ISSN), is expanding from a computational neuroscience core, while WikiGenes links genes with their functions, and Citizendium places special emphasis on context by using semantically Related Articles for site navigation (similar to the “scope” of traditional journals, or to the Frontiers Distillation System).
This way of structuring knowledge may also have implications for research funding: Suppose a topic is covered on a platform like this such that all the relevant current knowledge on it is contained therein but no subtopics (in Citizendium’s parlance) exist which would merit their own article wouldn’t this provide for a mechanism to identify areas for future research that is much simpler (and more transparent) than what we have today? Instead of researchers spending their time iteratively writing and reviewing grant applications, they would concentrate on keeping a shared body of knowledge (or knowledge exchange system) up to date in their fields of expertise, while funders would look for “missing subtopics” or “missing recent updates” they could tag with the amount of funds they are prepared to invest there. Distribution of these funds to the appropriate researchers could then be achieved by posting bids to and discussing them on public platforms like FundScience (which hosts this blog), using prior contributions to the whole knowledge exchange system (e.g. data, interpretation, theory, incorporation into scientific context) as a measure of merit instead of the infamous Journal Impact Factor that dominates such decisions today.
Where are the journals in that system? At least not in the very prominent position they are occupying now. Given that PLoS ONE is on track to become the world’s largest scientific journal in about a year or so, it will be especially interesting to see what impact its combination with wiki-style interactive research environments like PLoS Currents might have on science communication as a whole.
| Posted in Research Resources | 2 Comments »
What would science look like if it were invented today?
07.13.09 by Daniel Mietchen
Editor’s Note: The following is an article by guest blogger Daniel Mietchen, PhD, originially written up for the Euroscientist, the blog of Euroscience.org. This first part of a two-part series on “What would science look like if it were invented today?” deals specifically with the implications of the transition from paper-based to electronic communication for the process of knowledge creation. It delves into the importance of collaboration and openness in science. FundScience.org cross-posts this article, as well as the forthcoming second installment, because of our passion to promote open science and collaboration, not only between scientists, but between the scientific community and the public. Note that the drafting takes place in a wiki, so you can join in.
The Internet represents an opportunity to change this system, one which has created a 300-year-old, collective long-term memory, into something new and more efficient, perhaps adding in a current, collective short-term working memory at the same time. With new online tools, scientists could begin to share techniques, data and ideas online to the benefit of all parties, and the public at large. (Robert J. Simpson, paraphrasing Michael Nielsen)
Sure, it is hard to imagine you reading this blog post in a world which hadn’t yet engaged in science but the question “What would email look like if it were invented today” was recently addressed during the presentation of the Wave protocol, and entertaining some similar ideas on reinventing science may perhaps be worthwhile: how would a system have to be designed that creates and structures knowledge such that these two complex processes can effectively feed on and adapt to each other, making use of the most appropriate technologies at hand? Both processes are highly interrelated but to facilitate the discussion, we will first consider them separately (in this and the next issue of the Euroscientist), and then provide a synthesis (to which you can contribute).
Part I: What would knowledge creation look like if it were invented today?
The basic components of research
Let us start by considering scientific knowledge creation or research, for short. Within the framework of existing knowledge, this requires, as a first step, the identification (and perhaps further characterization) of a gap to be bridged or closed, albeit some methodologists prefer or even have to construct their bridges before choosing a suitable place to install them.
Once such a gap has been identified (we will leave a detailed consideration of this process for later), three basic components are necessary to close it, usually following each other as stages of a research project:
- Planning: an idea on how to bridge or close the gap
- Realization: the means to put the idea into practice
- Verification: independent assessment of the realization.
A fourth component is crucial to the process appropriate communication during and across the three basic stages as well as beyond individual research projects. Traditionally, this was (and still is) accomplished separately for each of them:
- Grant proposals after an idea had been prepared for realization,
- Conference and journal papers once the realization had progressed, and
- Further papers (by independent investigators) once replication had been attempted (e.g. as a control experiment in a follow-up study).
The decoupling of this fourth component from the other three, however, is simply a trait our research landscape has inherited from the era of paper-based scientific communication, and by far not a technical necessity today when basically any kind of information can be shared instantly (with few exceptions, e.g. patient data) within and beyond the scientific community. For our purposes, we will thus reframe the concept of putting ideas or results on paper as putting them on a wiki, a blog, a dedicated online repository or successors of these (e.g. as blips or wavelets within the proposed Wave protocol) in any case a shared research environment from where they can be syndicated and aggregated in various forms and embedded in other digital environments.
Hello to public research environments online
In this kind of framework (best known as Open Science, henceforth public research environment to emphasize that the concept is applicable across disciplines and that communication in and with the public is different from science as we know it), individual contributions (or comment thereupon) can be automatically assigned a unique identifier (henceforth contribution ID; this may be a revision number with time stamp in wikis or databases, a DOI for journal articles or an ISBN for books), linked to its originator (henceforth contributor ID; usually the user name) as well as other relevant information (e.g. funding sources), and aggregated in various forms. In a paper-based system, contributor ID is mainly composed of an author’s surname plus some representation variable across journals of given names, such that a single contributor ID may be shared by different individuals whose names are identical or similar, while some individuals (especially those with multiple initials, with non-English characters, or who changed their name after marriage) may have more than one contributor ID. For online platforms, the contributor ID is generally unique within but not across individual online platforms, although a number of solutions towards unique identification of contributors have been implemented (e.g. OpenID), including some specifically targeted at scientists (e.g. Researcher ID).
Each contribution ID can not only be linked to its contributor but also tagged (similar to the keywords currently accompanying manuscripts or grant proposals) and have their quality assessed (or rated, for short) by individual contributors (perhaps as a function of the overlap between the tags for their personal expertise and those of the contribution under consideration) according to a pre-defined set of evaluation criteria (e.g. appropriateness to the current stage of a given project, reliability of the information supplied, or presentation with enough context to be understood by specialists and/ or the public). Some journals already allow such ratings and further comments. However, none of them currently provides aggregations of ratings or comments by contributor, although technical standards for such purposes are operational (e.g. hreview). Despite possible herding effects and other sources of error, the principle feasibility (not the effectiveness) of generating and aggregating such user-defined metrics has been demonstrated on multiple online platforms, especially in non-scholarly environments (tagging: Flickr; rating: Ebay) but in some scholarly ones too (tagging at CiteULike).
No working implementation currently exists that would address the lack of incentives for scientists to engage in collaborative research assessment of this sort but since both publishers and funding agencies have managed to coerce scientists and their institutions into all sorts of behaviour during research assessment exercises in the past and present, they should have no problems providing incentives to participate in this one which has the added benefits of being both transparent and beneficial to the scientific community as a whole (it is of note in this respect that there are very few incentives in the current system to deliver timely, fair and detailed peer reviews for grant proposals or manuscripts). One way to do this would be to require that every reference cited should be rated by the citing researchers (some journals already single out a few references in this manner as being “of outstanding interest” or similar, but aggregating such ratings of single references in a global database like Open Library would be more helpful), another would be to include both the quality and the quantity of a specific researcher’s ratings (both active and passive) into the determination of the variable portion of her research funding, perhaps with some sort of normalization by the usage frequency of the tags involved (to balance between large and small fields of inquiry, and to avoid exaggerated claims). The remaining obstacles to a wider adoption of such transparent reputation schemes based on a public research environment with unique contribution and contributor ID schemes are thus not of a technical nature, and we shall assume these features to be available for the system we are about to design.
So far, we have only covered technical aspects of redesigning a research system emancipated from the paper medium but, as Michael Nielsen put it, “[T]here is a second and more radical way of thinking about how the Internet can change science, and that is through a change to the process and scale of creative collaboration itself, enabled by social software such as wikis, online forums and their descendants.” In a similar vein, Timothy Gowers started the Polymath project with a blog post discussing the following idea:
It seems to me that, at least in theory, a different model could work: different, that is, from the usual model of people working in isolation or collaborating with one or two others. Suppose one had a forum (in the non-technical sense, but quite possibly in the technical sense as well) for the online discussion of a particular problem. The idea would be that anybody who had anything whatsoever to say about the problem could chip in. And the ethos of the forum in whatever form it took would be that comments would mostly be kept short. In other words, what you would not tend to do, at least if you wanted to keep within the spirit of things, is spend a month thinking hard about the problem and then come back and write ten pages about it. Rather, you would contribute ideas even if they were undeveloped and/or likely to be wrong.
This short way of communication is taken to an extreme via the exchange of text messages over mobile phones and web platforms, particularly Twitter or the social aggregator FriendFeed, and even though scientists clearly form a minority on such platforms, they did begin to incorporate them into their research.
Quick poll: did you check any references in this post so far? How did you did that? And how do you usually do it when you read a paper? Sadly, even though most scientific journals now publish their content on the internet, most of the formatting is still being performed with paper as a target only rarely are hyperlinks incorporated even in the online versions. Online environments, on the other hand, are built around hyperlinks and allow to embed basically any kind of media, for example the Science Commons video below that highlights the value of sharing scientific information.
Research seen in a new light
With the above remarks in mind, let us now reconsider the three stages listed above:
The conception of ideas is a process very specific to the problem at hand and to the individuals (or possibly even machines) dealing with it. Ideas may arise from intensive or superficial occupation with a topic, from experimental or theoretical work on it, from a literature search, from play with methods and concepts, and under multiple other conditions.
[slideshare id=1608861&doc=whatswrong4ss-090619091933-phpapp02]
If scientists can access all the scientific information relevant to their research, new ways of processing them can be invented: BioText Search Engine, for instance, allows to search the literature via figures from Open Access articles, while Pubfeed uses a corpus of user-defined seed papers to provide an automated stream of literature recommendations that can be fed into a feed reader. Upon visiting this platform according to her own schedule, the researcher can then just click on an item in the feed to go to the abstract, and with one more click to the full text (if she has access to it), suitable reference managers automatically download the article along with its metadata. Some such platforms even allow to host one’s digital library on the web and to share it (including metadata) with colleagues or collaborators a service that tremendously facilitates collaboration but is necessarily of limited use in the realm of toll-access barriers, even if one was lucky enough to receive an eprint for personal use from the authors of a particular study.
In contrast to grants, it is usually hard to tell when a research project began. For simplicity, let us thus assume that it is started by being entered into the public research environment and tagged as an idea with suitable keywords. Similar to the above-described feeds for publication alerts, scientists (and possibly other interested parties, including dedicated robots with their own contributor ID that access the system via its API) subscribed to specific tags or contributors (or combinations thereof) will then automatically be informed of the existence of this new project and may add to it (e.g. comments, references, extensions, limitations, illustrations, links to suitable tools or relevant legal information or related ongoing projects or previous refutations of similar ideas, offers for collaboration or funding, suggestions for a timeline, or simply further tags, or ratings of any of these), to which the original contributor and anyone else interested may respond. All of this would require open standards and suitable licensing as well as provisions for security and against spam.
As a result of these interactions, the planning of a subset of proposed projects will have taken shape after a while, i.e. the necessary material, financial and human resources integrated with a tentative timeline to acquire some preliminary data. Once these are available, they will be posted in the same way as everything mentioned before with the public research environment effectively acting as an electronic lab notebook and immediately visualized and integrated with the relevant information available in the system by then, such that the procedures can be adapted as needed to gather the amount and quality of data necessary to bridge the targeted knowledge gap in its most recent state.
Searchable lists sortable by tags, contributors, ratings, envisioned budget or other metadata can then be compiled automatically. On this basis, science funders (which may include dedicated funding bodies but also other organizations, companies, groups of scientists or others, possibly even including lay people) would be able to browse (potentially even with the aid of automated or semi-automated proposal crawlers) through the available proposals meeting their criteria and to either fund them directly or to signal to other funders that they would be willing to fund a proposal in part (such a practice would particularly benefit transdisciplinary projects, which often fall through the grid in traditional research funding). No technical difficulties here, just cultural ones associated with the cherished habit of keeping ideas and results private until formal publication.
It is important to note that such a public research environment would allow for independent verification right from the start in that independent samples could be investigated in parallel by independent scientists (or even robots) following the same public protocol and posting their data in public as they arise a situation far from being common in contemporary science, although not entirely new after successful completion of large-scale collaborative initiatives like the Human Genome Project.
The transition phase
One of the most frequently raised arguments against public research environments concerns the perceived danger of getting scooped of the information laid out under the eyes of everyone and their dog. But with a functional attribution system as described above, it will always be possible to point out, in public, who had posted what and when, thereby severely limiting the effectiveness of any scooping attempt. Furthermore, it is probably fair to assume that way more scientists would prefer to engage in collaboration rather than scooping, and so it is much more likely that the posting of ideas, results or analytical tools will result in constructive feedback early on, which may actually enhance their research. Indeed, once the paper-based separation of the communicative component of knowledge generation has been overcome, the incentives are going to shift towards releasing new information immediately. Until this is achieved and this will take a while the paper-based system will remain important, and our new system will have to be set up as a complement to it.
Interestingly, a public research environment would work best if the initiators of a project had a certain amount of baseline funding at their disposal to bring their research through the idea stage until the first preliminary data (when it is easier to get putative funders interested in the matter). Such baseline funding is realistic: A recent study on the cost effectiveness of the Natural Sciences and Engineering Research Council of Canada found that the costs of the research grant peer review exceeded the costs of providing every eligible researcher with a yearly baseline grant of about CAN$ 30k. Furthermore, a possibility to invest in selected projects initiated by others (either in terms of reviewer effort or as active participant) is perhaps even better a form of research assessment than classical behind-the-doors peer review.
Given that a rating system implemented in our public research environment would almost certainly be less expensive than classical committee-based peer review of grant proposals (most online platforms can be used at no or low cost, no travel costs are incurred by the process, and all the effort spent on reviewing currently often lost to society, particularly if a manuscript or proposal is rejected could be used immediately by anyone), the new system would represent an improvement with respect to the current one, even if neither the quality of the research, nor the speed of communicating the results were affected. But both are bound to improve in the new system, leaving more money in the research funding system that can actually be spent on research than this is currently the case.
Conclusion
A small change in the design of the research system switching from paper-based to web-based communication of ideas, results and verifications may have profound consequences: within the scientific community, the permanent communication of progress during the course of a project will shorten the feedback loops, allowing to improve or update the design of any research project on the run and to link it to other gap-closing or even maintenance work on our shared corpus of knowledge. Beyond the scientific community, a scientific cycle that is completely open will allow new ways of interaction with society at large, particularly the media: Instead of maintaining a stream of “scientists found out” broadcasts as they do today, the media could add in some issues of the “scientists are currently investigating let’s see how they do it” variety, and everybody and their dog could join. Such strong interaction with the public via the internet also set the frame for the discussion of the second aspect of science knowledge structuring to which we will turn next, and you are warmly invited to participate.
Acknowledgements
This post was written up on the basis of multiple and ongoing discussions in several online environments, particularly the Science 2.0 group at FriendFeed. Specifically, Bjφrn Brembs, Cameron Neylon and Michael Nielsen provided comments on an earlier draft.
Licensing information: Creative Commons video: CC-BY-NC-SA, Text & Slideshow: CC-BY.
| Posted in Research Resources | Comments Off

| Posted in Uncategorized | Comments Off