 | Machine Learning 
Archive for the 'Machine Learning' Category
October 6th, 2009, by Tim Finin, posted in Machine Learning, Privacy, Semantic Web, Social media
In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.
Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.
The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.
“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”
As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
September 21st, 2009, by Tim Finin, posted in AI, Machine Learning, Semantic Web, Social media
Netflix announced today that BellKor’s Pragmatic Chaos team was awarded the $1M Netflix Grand Prize.
“It is our great honor to announce the $1M Grand Prize winner of the Netflix Prize contest as team BellKor’s Pragmatic Chaos for their verified submission on July 26, 2009 at 18:18:28 UTC, achieving the winning RMSE of 0.8567 on the test subset. This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest. We congratulate the team of Bob Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Töscher and Chris Volinsky for their superb work advancing and integrating many significant techniques to achieve this result.”
Netflix announced that it will hold a new Netflix Prize 2 contest with details to be released.
What about the Ensemble’s last-minute entry, the one that seemed to top BellKor’s?
“Team BellKor’s Pragmatic Chaos edged out team The Ensemble with the winning submission coming just 24 minutes before the conclusion of the nearly three-year-long contest. Historically the Leaderboard has only reported team scores on the quiz subset. The Prize is awarded based on teams’ test subset score. Now that the contest is closed we will be updating the Leaderboard to report team scores on both the test and quiz subsets.”
As part of the final submission, teams were required to submit papers describing the approach. Here are the three that the winning team delivered.
The New York Times Bits blog also has an article, Netflix Awards $1 Million Prize and Starts a New Contest.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
July 27th, 2009, by Tim Finin, posted in AI, Machine Learning, Social media, Web
Who won the Netflix Prize? Ensemble or BellKors Pragmatic Chaos
Who won the Netflix Prize? According to a post in the NYT Bits blog, Netflix Challenge Ends, But Winner Is In Doubt, it’s still very much up in the air.
” So The Ensemble won, right? Not necessarily. In an e-mail message Sunday night, Chris Volinsky, a scientist at AT&T Research and a leader of the BellKor’s team, said: “Our team is in first place as we were contacted by Netflix to validate our entry.” And in an online forum, another member of the BellKor team, Yehuda Koren, a researcher for Yahoo in Israel, said his team had “a better Test score than The Ensemble,” despite what the rival team submitted for the leaderboard.
So is BellKor the winner? Certainly not yet, according to a Netflix spokesman, Steve Swasey. “There is no winner,” he said.
A winner, Mr. Swasey said, will probably not be announced until sometime in September at an event hosted by Reed Hastings, Netflix’s chief executive. The movie rental company is not holding off for maximum P.R. effect, Mr. Swasey said, but because the winner has not yet been determined.
The Web leaderboard, he explained, is based on what the teams submit. Next, Netflix’s in-house researchers and outside experts have to validate the teams’ submissions, poring over the submitted code, design documents and other materials. “This is really complex stuff,” Mr. Swasey said.
A leading member of The Ensemble, Domonkos Tikk, a Hungarian computer scientist, did not sound too hopeful. “We didn’t get any notification from Netflix,” Mr. Tikk said in a phone interview from Hungary. “So I think the chances that we won are very slight. It was a nice try.”
It seems strange that Netflix called the Bellkor team first, since according to the Leaderboard the Ensemble team submitted the top entry.
UPDATE 2/28: Today’s NYT has a good article on the Netflix Prize and the role of teamwork for developing machine learning systems, Netflix Competitors Learn the Power of Teamwork.
Edit | Bookmark@del.icio.us | Trackback | 3 Comments »
July 26th, 2009, by Tim Finin, posted in AI, Machine Learning, Social media, Web
Netflix has announced that the Netflix Prize contest is now closed. Presumably, The Ensemble is the winner, subject to final qualification.
“We are delighted to report that, after almost three years and more than 43,000 entries from over 5,100 teams in over 185 countries, the Netflix Prize Contest stopped accepting entries on 2009-07-26 18:42:37 UTC. The closing of the contest is in accordance with the Rules — thirty (30) days after a submitted prediction set achieved the Grand Prize qualifying RMSE on the quiz subset.
…
Qualified entries will be evaluated as described in the Rules. We look forward to awarding the Grand Prize, which we expect to announce in a few weeks. However if a Grand Prize cannot be awarded because no submission can be verified by the judges, the Contest will reopen. We will make an announcement on the Forum after the Contest judges reach a decision.”
So what’s left for the judges to do. The rules say that “a panel of senior Netflix engineers and qualified independent judges” need to “ensure that the provided algorithm description and source code could reasonably have generated the prediction sets submitted”. To do this, the candidate winner must produce the algorithm along with a description of who it works. And, of course, before receiving the prize the winner has to grant Netflix
“an irrevocable, royalty free, fully paid up, worldwide non-exclusive license under the Participants’ copyrights, patents or other intellectual property rights in the winning algorithm (”Winning Algorithm”) to reproduce, distribute, display, and create derivative works from the Winning Algorithm and also to make, have made, use, sell, offer for sale, and import products that would otherwise infringe the Winning Algorithm.”
The Netflix Prize was a great idea and generated a lot of interest around the world. It’s been good for the field of AI and its machine learning sub-field, especially. Congratulations to the Ensemble team and condolences to BellKor’s Pragmatic Chaos. I wish there could have been two winners.
UPDATE 2/27: Wait! The winner is still in doubt.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
July 26th, 2009, by Tim Finin, posted in AI, Machine Learning, Social media
The race for the Netflix Prize is still on.
With just one day left in the 30 day last call period before BellKor’s Pragmatic Chaos (BKPC) was awarded the $1M Netflix Prize for a better movie recommender system, another team has broken the 10% improvement threshold and taken the lead by one hundredth of one percent — The Ensemble.
The Ensemble was formed by the merger of two existing Netflix Prize teams that had been ranked second and third behind BKPC: ‘Grand Prize Team’ and ‘Opera Solutions and Vandelay United’. Here’s how The Ensemble describes it’s genesis.
The crowd is indeed wiser than the individual.
The 10% barrier once seemed distant and insurmountable. But when the contest’s “last call” heralded the heroic achievements of BellKor’s Pragmatic Chaos, the rest of the crowd pondered, and asked why the barrier couldn’t be broken twice.
And lo, as if powered by gravity, Grand Prize Team and Vandelay Industries! began to draw in more and more members. And Vandelay went on to join forces with Opera Solutions, and then Vandelay and Opera united with Grand Prize Team, and then … and then … well, things got so complex we decided just to call ourselves The Ensemble.
We can be sure that there will be a lot of Netflix Prize activity in the coming weeks and maybe months as these two teams compete and perhaps more mergers create super-teams. BKPC and Ensemble could even decide to merge and share the prize. Watch the Netflix Leaderboard for the latest ranking.
UPDATE: I had assumed the 30 day last call would reset with each new leader, like auctions on ebay. Not so. The prize will be won (and lost) today! Here’s the relevant section in the rules:
“To qualify for the Grand Prize the RMSE of a Participant’s submitted predictions on the test subset must be less than or equal to 90% of 0.9525, or 0.8572 (the “qualifying RMSE”). After three (3) months have elapsed from the start of the Contest, when the RMSE of a submitted prediction set on the quiz subset improves beyond the qualifying RMSE an electronic announcement will inform all registered Participants that they have thirty (30) days to submit additional candidate prediction sets to be considered for judging. At the end of this period, qualifying submissions will be judged (see Judging below) in order of the largest improvement over the qualifying RMSE on the test subset. In the case of tied RMSE values on the test subsets, the submission received earliest by the Site will be judged first.”
The August 2009 CACM has a short note, Just for You (pdf), on recommender systems and the Netflix prize by BKPC member Don Monroe that includes a visualization by Ensemble member Chris Hefele.
Spotted on Hacker News. See Techcrunch also.
UPDATE II: The Netflix Prize contest has closed.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
June 26th, 2009, by Tim Finin, posted in AI, Machine Learning, Social media
BellKor’s Pragmatic Chaos has broken the 10% barrier, a feat that may have won them the $1M Netflix prize. We’ll know for sure in 30 days.
“June 26, 2009: Today our team submitted our solution to the Netflix Prize, resulting in a score of .8558, which corresponds to an improvement over Netflix Cinematch algorithm of 10.05%. This is the first submission in the competition to break the 10% barrier and sets off a 30 day period where all competitors are invited to submit their best and final solutions.
The prize is the award by Netflix for an open competition that started in October 2006 for the best collaborative filtering algorithm predicting user ratings for films from a database of previous ratings. Today the BellKor’s Pragmatic Chaos team submitted an entry that improved on the existing algorithm by 10.05%, exceeding the 10% improvement threshold required of a winner. The team is a collaboration between people from Pragmatic Theory, Commendo, Yahoo and AT&T.
“The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.”
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
September 15th, 2008, by Tim Finin, posted in AI, Machine Learning, Semantic Web, Social media, Web
The NYT has an interesting article, Stuck in Google’s Doghouse, on the importance of search engines to many businesses. Or maybe it’s about ad arbitrage and the ways that some Web business models are based on gaming search engines and Web advertising. In any case, it’s especially relevant in the light of the recently announced Google-Yahoo advertising deal.
One of the most interesting aspects to the story, at least to me, is who gets the credit or blame for significant decisions and events on the Web — people or machines.
“When Mr. Savage asked Google executives what the problem was, he was told that Sourcetool’s “landing page quality” was low. Google had recently changed the algorithm for choosing advertisements for prominent positions on Google search pages, and Mr. Savage’s site had been identified as one that didn’t meet the algorithm’s new standards. (As Google defines it, landing page quality includes a series of attributes — loading speed, user friendliness, relevancy, originality and dozens of other characteristics — that it deems appropriately “googly.”)” source
A more dramatic example of our brave new world was the $1B problem United Airlines stock had last week, as outlined in A Stock-Killer Fueled by Algorithm After Algorithm.
“What made a six-year-old article about a bankruptcy filing by United Airlines reappear on Wall Street traders’ screens on Monday as if it were fresh news, prompting a sell-off that erased $1 billion in the company’s market value in a matter of minutes? The path the article followed from forgotten archive entry to present-day stock-killer has begun to emerge, and it raises some interesting questions about how news rockets around the Web. Both human error and far-from-foolproof technology seem to have played a role in the episode, which involved a 2002 Chicago Tribune report; the web site of the Sun Sentinel, a Florida newspaper owned by the same company; the Bloomberg News financial wire service; and Google, all apparently unwittingly.” source
The automation is inevitable, IMHO, and probably a good thing. Of course, I reserve the right to revise and extend my remarks if my own ox is gored.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
June 5th, 2008, by Tim Finin, posted in AI, Machine Learning, NLP
Colin de la Higuera of Jean Monnet University will talk on “ Grammatical Inference: Some of the Questions Out There ” at 1:00pm next Tuesday in the large CSEE conference room.
“Grammatical Inference is a field concerned with learning grammars given data about a language. In this talk we survey some of the questions being addressed by researchers in the field. Some of these are now classical and have been looked into for some time, others are more recent:
- understanding the models and the paradigms: what does polynomial language learning mean?
- learning more complex families of languages
- scaling up and using grammatical inference in applications
Edit | Bookmark@del.icio.us | Trackback | No Comments »
April 21st, 2008, by Tim Finin, posted in Datamining, Machine Learning, UMBC
Jiawei Han will give a talk tomorrow, Research Challenges In Data Mining at 10am in UMBC’s
LH8 (1st floor ITE building). Here’s the abstract.
“Research in data mining has led to advanced knowledge discovery technologies and applications. In this talk, we will discuss some emerging research issues for advanced technologies and applications in data mining and discuss some recent progress in this direction, including (1) exploration of the power of pattern mining, (2) analysis of multidimensional, heterogeneous and evolving information network, (3) mining of fast changing data streams, (4) mining of moving object data, RFID data, and data from sensor networks, (5) spatiotemporal and multimedia data mining, (6) biological data mining, (7) text and Web mining, (8) data mining for software engineering and computer system analysis, and (9) data cube-oriented multidimensional online analytical analysis.”
The talk is part of a distinguished lecture series sponsored by the UMBC Information Systems Department. Here’s a flier.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
February 18th, 2008, by Akshay Java, posted in Machine Learning, Semantic Web, Social media, Web, Web 2.0
Social Networks and Web graphs exhibit certain typical properties. The classic work by Barabási–Albert showed how nodes in such network link preferentially — popular nodes often gain disproportionately larger share of the links. This is also known in other fields as the 80/20 rule or simply the “rich get richer phenomenon“. Another early work by Steve Borgatti studied social networks and found that they exhibit a core-periphery property. A small set of (popular) nodes form the core and the rest comprise of the peripheral nodes. To the best of my knowledge, community detection algorithms have often worked independent of such underlying network properties.
I have been exploring an idea that can utilize the core-periphery structure of social networks to approximately compute the communities in the graph. The intuition behind this method is really quite simple. The basic idea boils down to the following:
“The core of the social network typically defines the communities present in it. By looking at the link structure of the core and identifying how the rest of the network connects to the core we can efficiently compute communities in large graphs.”
This idea can be easily explained by considering the following network of email communication (obtained from Dr. Mark Newman’s site). The original adjacency matrix was permuted to order the nodes based on their degree. Thus the core is represented by submatrix A which is quite dense. The submatrix B, here corresponds to how the rest of the network links to its core. The submatrix C is a very sparse matrix that consists of links between nodes in the long tail. Since C is quite sparse, it can be ignored without much degradation of the clustering/community detection results. Thus it leads to saving a significant amount of computation and storage. By utilizing just the core of the social network (matrix A) and how other nodes link to the core (matrix B) we can approximate the overall community structure of the entire graph, much more efficiently.
The rest boils down the to the mathematical formulation of the above idea using Spectral clustering techniques. You can read more about it in my poster paper that was recently accepted to ICWSM. (A Tech Report version with a more detailed analysis would be available shortly)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
November 8th, 2007, by Tim Finin, posted in Machine Learning, Social media
Technology review has a short article, A Better Recommendation Engine, on the Seattle company Cleverset that offers recommendation services for ecommerce.
“Now a Seattle-based startup called Cleverset thinks it has the secret to the next-generation recommendation system: a type of computer modeling found mainly in artificial-intelligence research labs. Cleverset’s system weighs the importance of the relationship among individual shoppers, their behavior on the site, the behavior of similar shoppers, and external factors such as seasons, holidays, and events like the Super Bowl. Using these ever-changing relationships, Cleverset’s system serves up products that are statistically likely to match what the customer will find interesting.” (link)
Cleverset was founded in 2000 by Bruce D’Ambrosio of Oregon State University. Their approach is based on statistical relational learning.
“Cleverset uses an approach called statistical relational modeling, developed in the past decade, in which each piece of information in a data set is linked together based on its relationship to every other piece of information. This contrasts with the previous view of looking at data as if in an Excel spreadsheet, where everything carries an equal weight.” (link)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 28th, 2006, by Tim Finin, posted in AI, Gadgets, Machine Learning, Mobile Computing, Wearable Computing
A group of UMBC students working with Professor Zary Segall have built a prototype music player that senses its user’s emotional state and level of activity and picks appropriate music. The prototype system uses BodyMedia’s SenseWear, which detects continuous data from the wearer’s skin and wirelessly transmits the data stream to the xpod prototype. The physiological data includes energy expenditure (calories burned), duration of physical activity, number of steps taken, and sleep/wake states. A neural network system is used to learn associations between these biometric parameters and the user’s preferences for music and the resulting model is then used to dynamically construct the xpod’s playlist. Read more about the xpod prototype in this recent paper:
XPod a human activity and emotion aware mobile music player, Sandor Dornbush, Kevin Fisher, Kyle McKay, Alex Prikhodko and Zary Segall.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  | You are currently browsing the archives for the Machine Learning category.
  Home
|
Archive
|
Login
|
Feed
|  |