 | Web 2.0 
Archive for the 'Web 2.0' Category
July 2nd, 2008, by Tim Finin, posted in Social media, Web, Web 2.0
The Chronicle of Higher Education has a story on students using BitTorrent to share scanned copies of textbooks. The article, Textbook Piracy Grows Online, Prompting a Counterattack From Publishers, starts off
“College students are increasingly downloading illegal copies of textbooks online, employing the same file-trading technologies used to download music and movies. Feeling threatened, book publishers are stepping up efforts to stop the online piracy. One Web site, called Textbook Torrents, promises more than 5,000 textbooks for download in PDF format, complete with the original textbook layout and full-color illustrations. Users must simply set up a free account and download a free software program that uses a popular peer-to-peer system called BitTorrent. Other textbook-download sites are even easier to use, offering digital books at the click of a mouse.”
Text books are an interesting niche for file sharing. They are surely expensive and publishers manage to publish new editions of popular titles almost every year, undermining the market for used texts. On the other hand, digitizing a text book requires scanning it, which takes time, attention to detail, equipment, and labor. It’s not as simple as ripping a CD.
Update 7/7/08: The Chronicle of Higher Education has a follow up story, Founder of Textbook-Download Site Says Offering Free Copyrighted Textbooks Is Act of ‘Civil Disobedience’
“… But the founder of Textbook Torrents calls his actions “civil disobedience” against “the monopolistic business practices” of textbook publishers. The site’s founder, who asked to remain anonymous for fear of legal action against him, talked to The Chronicle over an Internet phone call last night and defended his creation, though he described it as operating in a “legal gray area.” He said he is an undergraduate at a college outside of the United States, though he would not name the institution or country, and that he operates the Web site from there. His biggest complaint: that textbooks are just too expensive, and that prices climb each year. “We’re showing both students and textbook publishers that this isn’t acceptable anymore,” he said. “A lot of users are absolutely fed up with the system.” He said he views the 64,000 registered users of his textbook-download site as votes against that system.”
Edit | Bookmark@del.icio.us | Trackback | 5 Comments »
June 26th, 2008, by Tim Finin, posted in AI, NLP, Semantic Web, Web 2.0
Venture Beat reports that Microsoft will acquire Powerset for a price “rumored to be slightly more than $100 million”. Powerset has been developing a Web search system that uses natural language processing technology acquired from PARC to more fully understand user’s queries and the text of documents indexed.
“By buying Powerset, Microsoft is hoping to close the perceived quality gap with Google’s search engine. The move comes as Microsoft CEO Steve Ballmer continues to argue that improving search is Microsoft’s most important task. Microsoft’s market share in search has steadily declined, dropping further and further behind first-place Google and second place Yahoo.
Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion. Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.”
If you put the query “Where is Mount Kilimanjaro” into the beta version of Powerset, it answers “Mount Kilimanjaro: Contained by Tanzania” in addition to showing web pages extracted from Wikipedia. That’s a pretty good answer.
Its response to “what is the Serengeti” is a little less precise. It reports seven things it knows about Serengeti — that it replaced “desert, Platinum”, twilight and Caribbean Blue”, that it hosted ‘migration’, that it provided ‘draw’, that it gained ‘fame’, that it recorded ‘explorations’, that it rutted ‘season’ and that it boasted ‘Blue Wildebeests’. I’m just glad I don’t have a school report due on the Serengeti due tomorrow!
Asking “Who is the president of Zimbabwe” results only in the fallback answer — which appears to be just the set of Wikipedia pages that the query words produce in an IR query. Compare this with the results of the Google query who is the president of zimbabwe site:wikipedia.org.
By the way, the AskWiki system often does a better job on these kinds of question. Asking “where is the Serengeti” produces the answer “The Serengeti ecosystem is located in north-western Tanzania and extends to south-western Kenya between latitudes 1 and 3 S and longitudes 34 and 36 E. It spans some 30,000 km.” It’s a bit of a hack, though. It seems to work by selecting the sentence or two in Wikipedia that best serves as an answer. See our post on Askwiki from last Fall for more examples.
Still, Powerset is an ambitious system that shows promise. What they are trying to do is important and will eventually be done. They have shown real progress in the past two years, more than I had expected. I hope Microsoft can accelerate the development and find practical ways to improve Web search even if the ultimate goal of full language understanding is many years away.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
June 26th, 2008, by Tim Finin, posted in Semantic Web, Social media, Web, Web 2.0
Wired has an interesting article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, that discusses the data driven revolution that computers and the Web have unleashed. Science used to rely on developing models to explain and organize the world and make predictions. Now much of that can be done by correlating large amounts of data. It applies equally well to other disciplines (e.g., Linguistics) as well as businesses (think Google).
“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.
Update: And then there is this counterpoint: Why the cloud cannot obscure the scientific method .
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
June 24th, 2008, by Tim Finin, posted in GENERAL, Security, Social media, Web 2.0
The July/August issue of Technology Review is focused on Web 2.0. The lead article, “The Business of Social Networks“, asks “Web 2.0–the dream of the user-built, user-centered, user-run Internet–has delivered on just about every promise except profit. Will its most prominent example, social networking, ever make any money?”
“Social networking is the fastest-growing activity on Web 2.0–the shorthand term for the new user-centered Internet, where everyone publicly modifies everyone else’s work, whether it’s an encyclopedia entry or a photo album. The growth of social networking is astonishing, and it has spread to sites of all sizes, which are increasingly intertwined as platforms open (see “Who Owns Your Friends?”). Even small players are soaring.”
There are quite a few interesting stories on various Web 2.0 topics. Visit the table of contents to see what’s available.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
June 23rd, 2008, by Tim Finin, posted in Semantic Web, Social media, Web, Web 2.0
The cover story of the July 2008 CACM (v51, n7) is Web Science by Jim Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Danny Weitzner. The article argues for an interdisciplinary approach to understanding the Web as an entity in its own right. It’s great that this article is freely available on the web. Ironically, figuring out what URL to use to link to it was a bit tricky and the pages are rendered as png images to protect the IP. But, it’s a good article that lays out an important new area of study in information systems.
“Despite the Web’s great success as a technology and the significant amount of computing infrastructure on which it is built, it remains, as an entity, surprisingly unstudied. Here, we look at some of the technical and social challenges that must be overcome to model the Web as a whole, keep it growing, and understand its continuing social impact. A systems approach, in the sense of “systems biology,”, is needed if we are to be able to understand and engineer the future Web.”
What I find exciting is that one of the attributes that makes the Web so successful is that it is a system to which all can contribute. We need to make sure it remains that way and doesn’t devolve into a hegenomic structure.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
June 20th, 2008, by Tim Finin, posted in KR, Ontologies, RDF, Semantic Web, Web 2.0
The W3C has officially announced that RDFa is a candidate recommendation
“2008-06-20: The Semantic Web Deployment Working Group has published a Candidate Recommendation of RDFa in XHTML: Syntax and Processing. Web documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience. RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. See the group’s RDFa implementation report. The Working Group also updated the companion document RDFa Primer. Learn more about the Semantic Web and the HTML Activity.”
Achieving candidate recommendation status is a significant step toward becoming a W3C recommendation. Congratulation to the working group for all of their efforts in developing RDFa.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
June 20th, 2008, by Tim Finin, posted in Social media, Web 2.0
The Personal Democracy Forum is sponsoring a twitter debate tonight on “technology and government” between representatives of Barack Obama and John McCain to be moderated by Time magazine blogger Anna Marie Cox. A note on PDF has the details:
“The McCain campaign will be represented by Liz Mair, the online communications director of the Republican National Committee. The Obama campaign will be represented by Mike Nelson, a professor at Georgetown University who served in the Clinton White House under Vice President Gore on tech policy issues. He is an outside advisor to Obama’s campaign on issues of technology, media and telecommunications.”
Of course, it remains to be seen what kind of debate can happen if short taking points are further compressed into 140 character tweeting points. It will be an interesting experiment.
“Mike, Liz and Ana will be using their personal Twitter accounts, @mikenelson, @lizmair and @anamariecox, and we’ve also asked them to tag their responses with the hashtag #pdfdebate. We suggest that readers who want to follow along use a Twitter application like Summize.com to track the conversation.”
The debate will start sometime tonight (Friday 20 June) and is expected to run through the end of the conference on Tuesday 24 June and maybe beyond.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
June 6th, 2008, by Tim Finin, posted in GENERAL, Social media, Web 2.0
Zeynep Tufekci gave a very interesting talk on “A Different Kind of Social Physics: Online Communities and the Revolution in the Architecture of Our Social Spaces” at the JHU Applied Physics Lab last week.
Dr. Tufekci is an assistant professor in UMBC’s Department of Sociology and Anthropology and has interests in the social impacts of technology and social computing. What is somewhat unusual for a Sociologist, I assume, is that her undergraduate degree is in computer science and she worked as a programmer before getting her PhD in Sociology.
Her talk made some very interesting points about how the new environments created by social computing systems differ from the ones we have evolved to adapt to.
“Everyday, tens of millions of people chat, text, email, poke, twitter, IM and facebook (and, yes, that is a verb). They do what people have always done: they make friends and mark enemies, they assert and seek status, they look for affirmation and for connection, they check out the competition and, above all, they seek the comfort of community. Contrary to earlier predictions, people do not undertake revolutionary, unheard of acts just because the medium is new. In fact, the rise of social computing is hardly surprising to social scientists: we know this is what people do. The significance of this development lies from not the acts themselves but in the characteristics of the environment.
The social physics of online communities are starkly different than those of the offline world — and that has far-reaching consequences. A different type of optics, audience, persistence, traversability and other structural attributes combine to create a different kind of social architecture. However, all evidence so far shows that most people bring to this new medium the cultural vocabulary of the regular, offline world (and, indeed, what else could they do?). This talk will explore the potential consequences of millions of mundane acts performed in a new kind of medium, as well as research opportunities presented by this revolution in the shape of our sociality.”
She was able to illustrate her points with examples gathered from the students in her classes about how their social lives are lived out through systems like Facebook.
Zeynep’s presentation slides are available.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
May 30th, 2008, by Tim Finin, posted in Blogging, Web, Web 2.0
A post on the Feedburner blog, Into the wild: AdSense for feeds, annunced that Google will start integrating AdSense ads into feeds next week.
“… publishers already in the FeedBurner Ad Network will continue to see premium CPM ads directly sold onto their content, but with the added bonus of contextually targeted ads that will fill up the remainder of their inventory. … And with AdSense, you’ll know that your back-filled ads are using the strongest contextual ad engine, ensuring the most relevant and profitable ads are delivered to your subscribers. … For publishers who are not yet placing ads in their feeds, any publisher who meets the requirements to join the AdSense program will also be able to use AdSense for feeds. You will be able to manage your feed ad units directly from AdSense Setup tab, and track performance right on the AdSense Report tab. …”
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
May 26th, 2008, by Tim Finin, posted in Social media, Web 2.0
A post on mediabistro.com, New York Times Joining the Social Networking Fray, says that the New York Times will release an API that “will allow users to ‘mash-up’ the NYT’s data — think layers on Google Maps.” The post quotes Aron Pilhofer, editor of interactive news, as saying that their goal is to “make the NYT programmable. Everything we produce should be organized data.”
This is good news. The newspaper business continues to lay off staff and offer buyouts as they predict declining revenues. These are talented and trained reporters, photo journalists and editors who do the hard work of discovering and writing the news. What they do can not be, and should not be, crowdsourced. The Times has the resources and experience to do this right. If they can show how to make up for some of the losses with innovations in their online presence, it will help the entire newspaper business by showing a way forward.
(Spotted on ReadWriteWeb)
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
April 30th, 2008, by Tim Finin, posted in GENERAL, Mobile Computing, sEARCH, Social media, Web 2.0
Remember when finding information on the Web was done by navigation using Gopher or Yahoo’s directory? I worked and we thought it was pretty good, at least until the search engines came along. Then we realized that search was much better than navigation for most tasks, especially as the size of the Web grew.
Recall how we get information from a big organization by phone today — we call customer service and navigate a confusing phone menu over the phone and after 10 minutes, end up being told to dial a different department. Dealing with such IVR (Interactive voice response) systems is part of the cost of living in our modern society. But maybe w can do better…
Fonolo offers a service that uses a search engine on their site to find the right spot on a company’s phone menu and connect you to it by a callback to your phone. You can even bookmark the point on the phone menu.
How do they do this? Here’s an explanation from IVR search: a ‘Google’ for phone menus?, a post on Telco2.0:
“And Fonolo wrote a web spider that visits large companies’ public phone numbers, and iterates through all the options on all the IVR menus from all the numbers, logging everything it finds. Then it’s just a matter of plotting it all on a directed graph, and making the whole thing searchable and available on the Web. And then the bit we like. You click on the bit you want to get through to, and their system uses the map to dial and navigate the IVRs for you, thus “deep dialing†the user directly to the point in the IVR they need. Every time someone dials through Fonolo, they use the interaction to re-validate that path through the IVR. The search terms that users submit tell them which companies they need to go spider.”
Fonolo is in a private beta mode, but you can sign up to be added to it on thei web site. You can see a video presentation of the idea and some ppt slides
Edit | Bookmark@del.icio.us | Trackback | Comments Off
April 7th, 2008, by joel, posted in Blogging, Ecoinformatics, GENERAL, Semantic Web, Social media, Web 2.0
EPA is on a web 2.0 kick. They sponsored a 2-day monster mashup exercise last Fall, the Puget Sound Information Challenge, and are making plans for further efforts. EPA’s CIO Molly O’neill talks a little about it here.
They’ve also been tracking and flirting with the semantic web, and are wondering how much effort to expend on a more full-on semantic engagement. I presented our semantic eco-blogging work at EPA headquarters in February, and was surprised at the turnout and enthusiasm. In response to a screen shot of a Fieldmarking post describing beach closings, a person from the Water Office related that he learned of the closing of his favorite Lake Erie swim-spot from a blog post. This made an impression on him, since, by rights, the closing should have been reported at the county level, up to the state level, and, ultimately, to his office in DC. It struck him that EPA should be systematically tapping the blogosphere for citizen sentiment and concern.
If they to do this, they will, implicitly, be saying to the citizenry “If you can’t be bothered to fill out the right form in the right office, at least blog about it, and maybe the machinery of the blogosphere will direct your thoughts our way.” I kind of like that. (This particular example – finding information on beach closings in a given area – can probably be done fairly efficiently with Yahoo pipes).
EPA will be hosting this week’s meeting of the multilateral ecoinformatics cooperation, and there will be participation from a wide swathe of EPA – I’m curious to learn of their plans.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
|  | You are currently browsing the archives for the Web 2.0 category.
  Home
|
Archive
|
Login
|
Feed
|  |