These are reasonable choices, thought I’ve have not done the double counting and added “machine learning applied to the massive amounts of Web data now available” and “social computing”.
But it’s gratifying to see the Semantic Web in the list. Here’s some of what he he has to say about search and the Semantic Web.
The relationship between search technology and the Semantic Web is a perfect illustration of how a small sustaining technology, such as a basic search feature on an operating system, will eventually be eaten up by a larger disruptive technology, such as the Semantic Web. The Semantic Web has the potential of acting like a red giant star by expanding at exponential rates, swallowing whole planets of existing technology in the process.
The technology started as a simple group of secure, trusted, linked data stores. Now Semantic Web technologies enable people to create data stores on the Web and then build vocabularies or write rules for handling the data. Because all the data by definition is trusted, security is often less of a problem.
The task of turning the World Wide Web into a giant dynamic database is causing a shift among traditional search engines because products such as Apture, by Apture Inc. of San Francisco, Calif., let content publishers include pop-up definitions, images or data whenever a user scrolls over a word on a Web site. The ability to categorize content in this manner could have significant implications not only for Web searches but also for corporate intranets and your desktop PC.
These types of products will continue to expand, initially in the publishing industry and then to most industries on the Web in the next two to three years.
For example, human resources sites could use them to pop up a picture and a résumé blip when a recruiter drags a mouse over an applicant’s name. Medical and financial sites such as the National Institutes of Health could use it to break down jargon and help with site exploration.
…
Government sites around the world, such as Zaragoza, Spain, and medical facilities, such as the Cleveland Medical Clinic, are using the vocabulary features of the Semantic Web to create search engines that reach across complex jargon and tech silos to offer a high degree of automation, full integration with external systems and various terminologies, in addition to the ability to accurately answer users’ queries.
…”
Google Chrome has been showing me a malware warning page today as I try to visit normally trusted and benign sites. I got this one just now as I tried to got to Planet RDF.
Warning: Visiting this site may harm your computer!
The website at planetrdf.com contains elements from the site bin.clearspring.com, which appears to host malware – software that can hurt your computer or otherwise operate without your consent. Just visiting a site that contains malware can infect your computer.
[ ] I understand that visiting this site may harm my computer. PROCEED
Clearspring claims it’s a technical problem, although they admit they were using a service that was compromised with files redirecting users to a certain malware domain. I’m a bit fuzzy on what clearspring does and where they are being used on the Planet RDF site. I don’t see it in the page source, for example.
update: Maybe the problem stems from flash cookies in blog content being syndicated by Planet RDF that have flash objects mediated by clearspring.
The W3C has published a second working draft of EmotionML, or the emotion markup language, Here’s how it’s described.
As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a “plug-in” language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.
Unfortunately EmotionML is not built on RDF. If it were, I would have marked up this post in RDFa using it!
The working draft identifies concrete examples where EmotionML might be useful including as a markup or representation for systems that do opinion mining, sentiment analysis, affect monitoring, and emotion recognition. A list of 39 individual use cases for EmotionML are given in an appendix.
EmotionML markup explicitly refers to one or more separate vocabularies used for representing emotion-related states. However, the group has defined some default vocabularies that can be used. An example is the Ekman “big six” basic emotions (anger, disgust, fear, happiness, sadness, and surprised). Another is the a set of appraisal terms defined by Ortony et al. (desirability, praiseworthiness, appealingness,, desirability-for-other, deservingness, liking, likelihood, effort, realization, strength-of-identification, expectation-of-deviation and familiarity)
Here’s an example from the working draft where a static image is annotated with several emotion categories with different intensities.
Google announced today that it has acquired Metaweb, the company behind Freebase — a free, semantic database of “over 12 million people, places, and things in the world.” This is from their announcement on the Official Google blog:
“Over time we’ve improved search by deepening our understanding of queries and web pages. The web isn’t merely words — it’s information about things in the real world, and understanding the relationships between real-world entities can help us deliver relevant information more quickly. … With efforts like rich snippets and the search answers feature, we’re just beginning to apply our understanding of the web to make search better. Type [barack obama birthday] in the search box and see the answer right at the top of the page. Or search for [events in San Jose] and see a list of specific events and dates. We can offer this kind of experience because we understand facts about real people and real events out in the world. But what about [colleges on the west coast with tuition under $30,000] or [actors over 40 who have won at least one oscar]? These are hard questions, and we’ve acquired Metaweb because we believe working together we’ll be able to provide better answers.”
In their announcement, Google promises to continue to maintain Freebase “as a free and open database for the world” and invites other web companies use and contribute to it.
Freebase is a system very much in the linked open data spirit, even thought RDF is not its native representation. It’s content is available as RDF and there are many links that bind it to the LOD cloud. Moreover, Freebase has a very good wiki-like interface allowing people to upload, extend and edit both its schema and data.
Here’s a video on the concepts behind Metaweb which are, of course, also those underlying the Semantic Web. What the difference — I’d say a combination of representational details and centralized (Metaweb) vs. distributed (Semantic Web).
“The New York Times is the number one newspaper web site. Analysts reckon it ranks first in reach among US opinion leaders. When the New York Times editorial staff tweaks its supersecret algorithm behind what to cover and exactly how to cover a story — as it does hundreds of times a day — it can break a business that is pushed down in coverage or not covered at all.”
Google published its own response to the Times piece as a Financial Times op-ed and also posted it to the Google public policy blog: regulating what is “best” in search?
“Search engines use algorithms and equations to produce order and organisation online where manual effort cannot. These algorithms embody rules that decide which information is “best”, and how to measure it. Clearly defining which of any product or service is best is subjective. Yet in our view, the notion of “search neutrality” threatens innovation, competition and, fundamentally,your ability as a user to improve how you find information.”
The penultimate paragraph gives what they say is their strongest argument againt mandating “search neutrality”.
“But the strongest arguments against rules for “neutral search” is that they would make the ranking of results on each search engine similar, creating a strong disincentive for each company to find new, innovative ways to seek out the best answers on an increasingly complex web. What if a better answer for your search, say, on the World Cup or “jaguar” were to appear on the web tomorrow? Also, what if a new technology were to be developed as powerful as PageRank that transforms the way search engines work? Neutrality forcing standardised results removes the potential for innovation and turns search into a commodity.”
This assumes of course, that there is real competition among Internet search engines. Microsoft has been putting a lot of research and development into Bing with good results and it’s been gaining market share. Yahoo is doing very interesting this as well. Consumer choice among a handful of competitors would be the best way to ensure that none abuse their customers.
Here’s a great resource if you want to come up to speed on ontologies and their importance today.
Professor Barry Smith of the University at Buffalo held a two-day course, An Introduction to Ontology: From Aristotle to the Universal Core, in 2009, to introduce ontologies and their applications to both philosophers and computer scientists. It consisted of of eight lectures for which slides and downloadable videos are available. Paul Alexander has also made the videos available in streaming form here if you want to view them without downloading.
The lectures are all either 60 or 90 minutes. Here are links to the streaming videos, thanks to Paul Alexander:
In what may be a first, today’s New York Times has an editorial about an algorithm. No, they haven’t waded into the P=NP issue, but commented on Google’s algorithm for ranking search results and accusations that Google unfairly biases it for its own self interest.
“In the past few months, Google has come under investigation by antitrust regulators in Europe. Rivals have accused Google of placing the Web sites of affiliates like Google Maps or YouTube at the top of Internet searches and relegating competitors to obscurity down the list. In the United States, Google said it expects antitrust regulators to scrutinize its $700 million purchase of the flight information software firm ITA, with which it plans to enter the online travel search market occupied by Expedia, Orbitz, Bing and others.”
This issue will become more important as the companies dominating Web search (Google, Microsoft and Yahoo) continue to increase their importance and also broaden their acquisition of companies offering web services.
The NYT’s position is moderate, recommending:
Google provides an incredibly valuable service, and the government must be careful not to stifle its ability to innovate. Forcing it to publish the algorithm or the method it uses to evaluate it would allow every Web site to game the rules in order to climb up the rankings — destroying its value as a search engine. Requiring each algorithm tweak to be approved by regulators could drastically slow down its improvements. Forbidding Google to favor its own services — such as when it offers a Google Map to queries about addresses — might reduce the value of its searches. With these caveats in mind, if Google is to continue to be the main map to the information highway, it concerns us all that it leads us fairly to where we want to go.
Google’s Open Spot Android app lets people leaving parking spots share the information with others searching for parking nearby. Running the app shows you parking spots within a 1.5km. New parking spots are assumed to be gone after 20 minutes and removed from the system.
People who announce open spots gain karma points, while those who report false spots, known as griefers, are on notice:
“We’re watching for behavior that looks like a griefer spoofing parking spots. We have a couple of mechanisms available to make sure someone can’t leave a bunch of fake parking spots. If we see this happening we will take steps to fix it.
This is a simple example of a context-aware mobile app that can further benefit from also knowing that you are driving, as opposed to riding, in your car and likely to want to find a parking spot, as opposed to doing 70mph on I-95 as it goes through Baltimore. Moreover, context would also inform that app that you are probably leaving a public parking spot and mark it automatically. However, such a feature should be smart enough to avoid being tagged by Google as a griefer and finding out what punishment Google has in store for you.
is what the md5sum function returns when applied to the string that is USCYBERCOM’s official mission statement. Here’s a demonstration of this fact done on a Mac. On linux, use the md5sum command instead of md5.
~> echo -n "USCYBERCOM plans, coordinates, integrates, \
synchronizes and conducts activities to: direct the \
operations and defense of specified Department of \
Defense information networks and; prepare to, and when \
directed, conduct full spectrum military cyberspace \
operations in order to enable actions in all domains, \
ensure US/Allied \ freedom of action in cyberspace and \
deny the same to our adversaries." | md5
9ec4c12949a4f31474f299058ce2b22a
~>
md5sum is a standard Unix command that computes a 128 bit “fingerprint” of a string of any length. It is a well designed hashing function that has the property that its very unlikely that any two non-identical strings in the real world will have the same md5sum value. Such functions have many uses in cryptography.
Thanks to Ian Soboroff for spotting the answer on Slashdot and forwarding it.
Someone familiar with md5 would recognize that the secret string has the same length and character mix as an md5 value — 32 hexadecimal characters. Each of the possible hex characters (0123456789abcdef) represents four bits, so 32 of them is a way to represent 128 bits.
We’ll leave it as an exercise for the reader to compute the 128 bit sequence that our secret code corresponds to.
Semantic Overflow is great largely because it benefits from the good design and implementation of the StackExchange framework. Could our site be be improved with Semantic Web technology, i.e., by eating our own dog food?
It’s not just an academic question. Recently the community QA site Training Examples got quite a bit of visibility as a site
“Where data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”
If you visit the site you will see that it closely follows the Stack Overflow design, complete with tags, reputation, badges, etc. It uses QSQA, which is free software licensed under GPL and implemented in Python using Django. Site creator Joseph Turian has mentioned a a desire to improve the site by applying machine learning and language processing techniques to its content.
So, how could Semantic Web technology be used to improve our own Q&A site? Add your suggestions here.
Training Examples QA is a site created by Joseph Turian where “data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”
It’s a close knock off of the popular stack overflow site and appears to be very well done.
If it catches on in the relevant research communities, it could be a very useful resource. (via LingPipe blog)
Semantic Overflow is a great way for the Semantic Web community to help one another with questions, problems and education. It was started in November 2009 using the Stack Overflow framework hosted by Stackexchange.
Like the parent stack overflow system, semantic overflow is a blend of a forum, wiki and recommendation site. It lets user ask, tag and answer questions, but also allows those with a sufficient reputation score to vote on and even edit both the questions and community submitted answers.
The tradition way of asking technical questions of a community is the mailing list or a Web based forum. The stack overflow model offers many advantages, so I hope this site continues gain traction.
If you want to monitor the site for new questions, you’ll find the feed of the 30 most recently submitted questions useful.