Yesterday the MIT faculty approved a university-wide open access policy. The full txt of the resolution, which passed unanimously, i available on Peter Suber’s Open Access News blog. Here’s an excerpt.
“Each Faculty member grants to the Massachusetts Institute of Technology nonexclusive permission to make available his or her scholarly articles and to exercise the copyright in those articles for the purpose of open dissemination. In legal terms, each Faculty member grants to MIT a nonexclusive, irrevocable, paid-up, worldwide license to exercise any and all rights under copyright relating to each of his or her scholarly articles, in any medium, provided that the articles are not sold for a profit, and to authorize others to do the same. The policy will apply to all scholarly articles written while the person is a member of the Faculty except for any articles completed before the adoption of this policy and any articles for which the Faculty member entered into an incompatible licensing or assignment agreement before the adoption of this policy. … The Provost’s Office will make the scholarly article available to the public in an open- access repository. The Office of the Provost, in consultation with the Faculty Committee on the Library System will be responsible for interpreting this policy, resolving disputes concerning its interpretation and application, and recommending changes to the Faculty.
I have to say I am conflicted about this and wish I was more informed. As a researcher, I am 100% for the right to make papers describing our results freely available. But I also recognize that publishers and professional societies are an essential part of our research infrastructure and their business models are partially built on copyright and controlling access to content.
Just as we are seeing the big changes in main stream media, we will probably see related changes in publishers, including professional societies. We’ll have to wait and see if they represent a phase shift to a new and better model or simply the collapse of the old one.
The analogy between the two is far from perfect. Traditional MSM publishers pay a professional staff to research, write and edit stories. Journal publishers and professional societies don’t typically pay their authors who increasingly deliver camera ready copy or near camera-ready electronic copy.
March 19th, 2009, by Tim Finin, posted in GENERAL, Web
I guess it’s time for March browser madness, with a fast new Safari 4 beta, the release of IE 8, and a new Google Chrome beta. Let’s add Firefox so that the pairings work out. Of course, none of the browsers are doing well in the pawn2own contest.
Here’s an item of possible interest to UMBC alumni in the area. The UMBC Alumni Association is holding a special tour and evening of networking at the National Cryptologic Musuem from 6-8pm on Wednesday March 25. If you have never visited the museum, it’s an opportunity to see some very interesting exhibits on ciphers and codes, including a working enigma machine. UMBC President Freeman Hrabowski will be there to meet with and talk to the participants. You can get more information and register for the event online or contact Monique Armstrong (phone: 410-455-1879).
We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday’s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.
“In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.”
Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance. The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.
Cloudera’s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide. The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.
CompuLab will offer the fit-PC2 as “the smallest, most power-efficient Intel Atom PC to date.” The nettop box some nice features, including DVI video output, no fan, mini SD slot, IR receiver, and six USB ports. Plus it only draws 6-8 watts of power (depending on the load) so you can deploy these things without feeling guilty about enlarging your carbon footprint (compare to the EEE nettop’s 36 watts).
This looks like a great small box for many projects. Now, if I could just think of one we need to do…
CompuLab is readying a full-featured Ubuntu Linux PC that draws six Watts and costs $245-to-$400. The Fit-PC2 packs a 1.1GHz or 1.6GHz Atom processor, 160GB hard drive (or SSD), and DVI/HDMI video up to 1920×1080 into a passively cooled case smaller than three CD cases.
Measuring 4 x 4.5 x 1.0 inches, the Fit-PC2 would be dwarfed by a stack of three CD jewel-cases, which would measure about 5.5 x 5 x 1.25. The Fit-PC2 is touted for its innovative, ruggedized die-cast aluminum case. There are no venting holes, but the fanless device is said to be designed so that the case itself dissipates heat.”
Microsoft has announced an add-in for Word 2007 that lets authors annotate a word or phrase with terms defined in external ontologies.
Addressing this critical challenge for researchers, Microsoft Corp. and Creative Commons announced today, before an industry panel at the O’Reilly Emerging Technology Conference (ETech 2009), the release of the Ontology Add-in for Microsoft Office Word 2007 that will enable authors to easily add scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers. Ontologies are shared vocabularies created and maintained by different academic domains to model their fields of study. This Add-in will make it easier for scientists to link their documents to the Web in a meaningful way. Deployed on a wide scale, ontology-enabled scientific publishing will provide a Web boost to scientific discovery.
The add-in is available for download from codeplex, Microsoft’s open source project hosting website. Its has support for a number of features, including syntax coloring of informative words, automatic detection of identifiers, and built-in access to ontologies and controlled vocabularies maintained by NCBO as well as biological databases such as Protein Data Bank, UniProtKB, and NCBI GenBank/RefSeq.
The add-in was produced by the UCSD BioLit group, hence the initial connections to bioinformatics ontologies. It would be great if future versions would have builtin awareness of the more popular linked data vocabularies.
The annotation is done using a custom XML schema which can be extracted and mapped to RDF. This example, from the codeplex site, shows the word “disease” being tagged with Human Disease ontology.
It’s not pretty and more verbose than RDFa, but gets the job done. There are many interesting add-ins for Microsoft Office components but most seem to be available for Office 2007 but not the Mac version, Office 2008. 🙁
The ScienceInsider feed from Science has a story, DARPA to Explore Geoengineering, about how DARPA is exploring the concept of geoengineering, i.e., the modification of Earth’s environment on a large scale to suit human needs and promote habitability.
“An official advisory group to the Defense Advanced Research Projects Agency is convening an unclassified meeting next week to discuss geoengineering, ScienceInsider has learned. DARPA is the latest in a number of official science funding agencies or top scientific societies that are exploring the controversial idea. But one leading advocate of the work opposes the military developing geoengineering techniques.
The 1-day meeting, to be held Wednesday at Stanford University, will be led by University of Illinois Urbana-Champlaign engineering professor Bill King under the auspices of the Defense Sciences Research Council, which advises DARPA. An agenda for the unpublicized event viewed by ScienceInsider listed top researchers who have studied geoengineering as speakers, including geochemist Ken Caldeira of the Carnegie Institution for Science and astrophysicist Gregory Benford of University of California-Irvine.”
“This summit will address the intersection of two active communities, namely the technical standards world, and the community of ontology and semantic technologies. This intersection is long overdue because each has much to offer the other. Ontologies represent the best efforts of the technical community to unambiguously capture the definitions and interrelationships of concepts in a variety of domains. Standards — specifically information standards — are intended to provide unambiguous specifications of information, for the purpose of error-free access and exchange. If the standards community is indeed serious about specifying such information unambiguously to the best of its ability, then the use of ontologies as the vehicle for such specifications is the logical choice. Conversely, the standards world can provide a large market for the industrial use of ontologies, since ontologies are explicitly focused on the precise representation of information. This will be a boost to worldwide recognition of the utility and power of ontological models. The goal of this Ontology Summit 2009 is to articulate the power of synergizing these two communities in the form of a communique in which a number of concrete challenges can be laid out. These challenges could serve as a roadmap that will galvanize both communities and bring this promising technical area to the attention of others.”
Charles Cooper has an article on CNET on How IBM’s sprucing up its ‘social’ side. He attended an IBM event (“Smarter Web Open House”) in which researchers from IBM offered “a peek at a cross-section of collaborative Web technologies–mostly in early beta stages and likely to need a lot more fine-tuning in the months ahead.” He writes that
IBM is putting serious effort into finding ways to use aspects of social computing for more collaboration among enterprise users. The big idea here being to make it easier for businesses to share corporate data in more useful fashion.
“Our perspective comes from business,” said Rod Smith, a computer scientist who is in charge of emerging Internet technologies at IBM. “There are many ecosystems inside the enterprise and we’re seeing how they want to expand those connections. So, we’re looking at how to do that.”
The article describes sever interesting projects including a Web mashup that creates a virtual medical room where physicians can review and comment on test data.
There’s been a lot of interest in Wolfram Alpha in the past week, starting with a blog post from Steve Wolfram, Wolfram|Alpha Is Coming!, in which he described his approach to building a system that integrates vast amounts of knowledge and then tries to answer free form questions posed to it by people. His post lays out his approach, which does not involve extracting data from online text.
“A lot of it is now on the web—in billions of pages of text. And with search engines, we can very efficiently search for specific terms and phrases in that text. But we can’t compute from that. And in effect, we can only answer questions that have been literally asked before. We can look things up, but we can’t figure anything new out.
So how can we deal with that? Well, some people have thought the way forward must be to somehow automatically understand the natural language that exists on the web. Perhaps getting the web semantically tagged to make that easier.
But armed with Mathematica and NKS I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.”
In a nutshell, Wolfram and his team have built what he calls a “computational knowledge engine” for the Web. OK, so what does that really mean? Basically it means that you can ask it factual questions and it computes answers for you.
It doesn’t simply return documents that (might) contain the answers, like Google does, and it isn’t just a giant database of knowledge, like the Wikipedia. It doesn’t simply parse natural language and then use that to retrieve documents, like Powerset, for example.
Instead, Wolfram Alpha actually computes the answers to a wide range of questions — like questions that have factual answers such as “What is the location of Timbuktu?” or “How many protons are in a hydrogen atom?,” “What was the average rainfall in Boston last year?,” “What is the 307th digit of Pi?,” “where is the ISS?” or “When was GOOG worth more than $300?”
“Stephen Wolfram generously gave me a two-hour demo of Wolfram Alpha last evening, and I was quite positively impressed. As he said, it’s not AI, and not aiming to be, so it shouldn’t be measured by contrasting it with HAL or Cyc but with Google or Yahoo.”
Doug’s review does a good job of sketching the differences he ses between Wolfram Alpha and systems like Google and Cyc.
Lenat’s description makes Wolfram Alpha sound like a variation on the Semantic Web vision, but one that more like a giant closed database than a distributed Web of data. The system is set to launch in May 2009 and I’m anxious to give it a try.
In this week’s ebiquity meeting (10am Wed 3/11) Ernst Grundke of Dalhousie University will talk on ‘Dalhousie’s Bachelor of Informatics: Rethinking Curriculum and Delivery’. We’ll stream the talk live at http://ebiquity.umbc.edu/tv/.
Abstract: Dalhousie University has offered a new Bachelor of Informatics degree since 2006. This program is a response to changes in the IT workplace and predicted shortages in the IT workforce. The program features integration across disciplines, project teams composed of students from all years of study, attention to communication skills in all student work, project management, explicit professional development, a collegial homeroom environment, and mandatory cooperative education terms. This presentation describes the goals, the structure, the curriculum, and the delivery of the program. Although this is a program based in computer science, the concepts are also applicable to engineering education.