 | Machine Learning 
Archive for the 'Machine Learning' Category
April 21st, 2008, by Tim Finin, posted in Datamining, UMBC, Machine Learning
Jiawei Han will give a talk tomorrow, Research Challenges In Data Mining at 10am in UMBC’s
LH8 (1st floor ITE building). Here’s the abstract.
“Research in data mining has led to advanced knowledge discovery technologies and applications. In this talk, we will discuss some emerging research issues for advanced technologies and applications in data mining and discuss some recent progress in this direction, including (1) exploration of the power of pattern mining, (2) analysis of multidimensional, heterogeneous and evolving information network, (3) mining of fast changing data streams, (4) mining of moving object data, RFID data, and data from sensor networks, (5) spatiotemporal and multimedia data mining, (6) biological data mining, (7) text and Web mining, (8) data mining for software engineering and computer system analysis, and (9) data cube-oriented multidimensional online analytical analysis.”
The talk is part of a distinguished lecture series sponsored by the UMBC Information Systems Department. Here’s a flier.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
February 18th, 2008, by Akshay Java, posted in Social media, Web 2.0, Web, Machine Learning, Semantic Web
Social Networks and Web graphs exhibit certain typical properties. The classic work by Barabási–Albert showed how nodes in such network link preferentially — popular nodes often gain disproportionately larger share of the links. This is also known in other fields as the 80/20 rule or simply the “rich get richer phenomenon“. Another early work by Steve Borgatti studied social networks and found that they exhibit a core-periphery property. A small set of (popular) nodes form the core and the rest comprise of the peripheral nodes. To the best of my knowledge, community detection algorithms have often worked independent of such underlying network properties.
I have been exploring an idea that can utilize the core-periphery structure of social networks to approximately compute the communities in the graph. The intuition behind this method is really quite simple. The basic idea boils down to the following:
“The core of the social network typically defines the communities present in it. By looking at the link structure of the core and identifying how the rest of the network connects to the core we can efficiently compute communities in large graphs.”
This idea can be easily explained by considering the following network of email communication (obtained from Dr. Mark Newman’s site). The original adjacency matrix was permuted to order the nodes based on their degree. Thus the core is represented by submatrix A which is quite dense. The submatrix B, here corresponds to how the rest of the network links to its core. The submatrix C is a very sparse matrix that consists of links between nodes in the long tail. Since C is quite sparse, it can be ignored without much degradation of the clustering/community detection results. Thus it leads to saving a significant amount of computation and storage. By utilizing just the core of the social network (matrix A) and how other nodes link to the core (matrix B) we can approximate the overall community structure of the entire graph, much more efficiently.
The rest boils down the to the mathematical formulation of the above idea using Spectral clustering techniques. You can read more about it in my poster paper that was recently accepted to ICWSM. (A Tech Report version with a more detailed analysis would be available shortly)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
November 8th, 2007, by Tim Finin, posted in Social media, Machine Learning
Technology review has a short article, A Better Recommendation Engine, on the Seattle company Cleverset that offers recommendation services for ecommerce.
“Now a Seattle-based startup called Cleverset thinks it has the secret to the next-generation recommendation system: a type of computer modeling found mainly in artificial-intelligence research labs. Cleverset’s system weighs the importance of the relationship among individual shoppers, their behavior on the site, the behavior of similar shoppers, and external factors such as seasons, holidays, and events like the Super Bowl. Using these ever-changing relationships, Cleverset’s system serves up products that are statistically likely to match what the customer will find interesting.” (link)
Cleverset was founded in 2000 by Bruce D’Ambrosio of Oregon State University. Their approach is based on statistical relational learning.
“Cleverset uses an approach called statistical relational modeling, developed in the past decade, in which each piece of information in a data set is linked together based on its relationship to every other piece of information. This contrasts with the previous view of looking at data as if in an Excel spreadsheet, where everything carries an equal weight.” (link)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 28th, 2006, by Tim Finin, posted in Gadgets, Wearable Computing, AI, Machine Learning, Mobile Computing
A group of UMBC students working with Professor Zary Segall have built a prototype music player that senses its user’s emotional state and level of activity and picks appropriate music. The prototype system uses BodyMedia’s SenseWear, which detects continuous data from the wearer’s skin and wirelessly transmits the data stream to the xpod prototype. The physiological data includes energy expenditure (calories burned), duration of physical activity, number of steps taken, and sleep/wake states. A neural network system is used to learn associations between these biometric parameters and the user’s preferences for music and the resulting model is then used to dynamically construct the xpod’s playlist. Read more about the xpod prototype in this recent paper:
XPod a human activity and emotion aware mobile music player, Sandor Dornbush, Kevin Fisher, Kyle McKay, Alex Prikhodko and Zary Segall.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
December 15th, 2005, by Pranam Kolari, posted in Blogging, memeta, splog, Technology, Web, Semantic Web, Machine Learning, GENERAL
In the blogosphere, pings are notifications sent by updated blogs to PingServers. A major issue recently has been unjustified pings, also known as Spings, sent by Splogs. Splogs have been discussed a lot recently, including an interesting thread on post piracy that Steve Rubel initiated on Micropersuasion.
The problem of splogs prompted us to analyze pings from weblogs.com, which publishes hourly pings as changes.xml. We have been collecting these pings over the last 4 weeks for a total of 40 million pings from around 14 million (so claimed) blogs. To begin with, we applied a language identification technique implemented by James Mayfield to identify language by fetching these blogs. As expected most of the pings were from blogs authored in English. But we were able to identify blogs from many other languages as well. For instance, charts below show a distribution of pings from blogs authored in Italian — over a day and over a week. Each bar denotes the number of pings per hour.


All times are in GMT; clearly Italian authored blogs display a specific blogging pattern.
In the next step we used our work on splog detection to detect splogs (and hence spings) among the english blogs. Our detection mechanism is close to 90% accurate. As shown in the charts below pings from blogs average around 8K per hour and those from splogs average around 25K.


Clearly almost 3 out of 4 pings are spings! Going back further to the source of these spings, we observed that more than 50% of claimed blogs pinging weblogs.com are splogs.
Based on the interestingness of this preliminary statistics, scope for further analysis and interest in the resulting dataset we decided to continuosly monitor the pingosphere. So, we now do it “live” on updated blogs published by weblogs.com(delayed by an hour), and have made it publicly available at http://memeta.umbc.edu. The site lists blogging patterns for many other languages, and compares splogs with blogs. All of our work is part of a larger project memeta, towards analyzing the content and structure of the blogosphere.
We hope our effort is a good complement to existing services (e.g., FightSplog, SplogReporter and SplogSpot) towards combating splogs. We currently publish only simple ping statistics on this site, but do stay tuned for fresh splog and classified blog dumps and much more!
UPDATE: Matthew Hurst from BlogPulse points us to an interesting analysis he has done on a day of weblogs.com pings.
Edit | Bookmark@del.icio.us | Trackback | 24 Comments »
October 21st, 2005, by Tim Finin, posted in Funding, KR, AI, Machine Learning
Peter Harsha reports that the Senate Appropriations Committee included language in the Senate version of the FY 06 Defense Appropriations bill that strips $55M from DARPA’s Cognitive Computing program, specifically “Learning, Reasoning, and Integrated Cognitive Systems”. That’s a 50% cut in the program. Peter points out that this runs counter to recent congressional sentiment that the role of computer science, especially university-led fundamental computer science, should be strengthened at DARPA.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 19th, 2005, by Tim Finin, posted in Blogging, Web, Machine Learning, Semantic Web
There’s been a lot of talk about splogs lately (e.g., here, there and everywhere). There was even a note in the Washington Post’s Computer Security blog today. We recently finished a paper on using SVMs to recognize splogs
Pranam Kolari, Tim Finin, and Anupam Joshi, SVMs for the Blogosphere: Blog Identification and Splog Detection, TR-CS-05-13, Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 8 October 2005.
The paper compares results using different feature sets for the task of splog recognition as well as some other simple tasks. We’ve submitted this to the AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
May 28th, 2005, by Tim Finin, posted in AI, Machine Learning, Agents
Cfengine is a configuration management tool that is widely used to manage networks of Unix systems. It was originally developed at the University of Oslo in 1993. I’ve only been dimly aware of it and assumed it was yet another common system administration tool for Unix. I was surprised to see how it’s described on the Cfengine site:
“About Cfengine: Cfengine, or the configuration engine is an autonomous agent and a middle to high level policy language and agent for building expert systems to administrate and configure large computer networks. Cfengine is designed to be a part of a computer immune system. It is ideal for cluster management and has been adopted for use all over the world in small and huge organizations alike.”
The developers have evolved their approach to use a biologically inspired immunity model and have a recent paper in the Machine Learning Journal.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
February 18th, 2005, by Harry Chen, posted in Technology Impact, Machine Learning
There is an interesting paper that describes how TiVo computes its recording recommendations.
The abstract:
We describe the TiVo television show collaborative recommendation system which has been fielded in over one million TiVo clients for four years. Over this install base, TiVo currently has approximately 100 million ratings by users over approximately 30,000 distinct TV shows and movies. TiVo uses an item-item (show to show) form of collaborative filtering which obviates the need to keep any persistent memory of each user�s viewing preferences at the TiVo server. Taking advantage of TiVo�s client-server architecture has produced a novel collaborative filtering system in which the server does a minimum of work and most work is delegated to the numerous clients. Nevertheless, the server-side processing is also highly scalable and parallelizable. Although we have not performed formal empirical evaluations of its accuracy, internal studies have shown its recommendations to be useful even for multiple user households. TiVo�s architecture also allows for throttling of the server so if more server-side resources become available, more correlations can be computed on the server allowing TiVo to make recommendations for niche audiences.
See PVRBLog

Edit | Bookmark@del.icio.us | Trackback | No Comments »
February 15th, 2005, by Harry Chen, posted in Ontologies, Web, Machine Learning
The Web is the largest database on the Earth, and Google has the largest index of this database. Two researchers at University of Amsterdam proposed a new system that uses Google search to learn and distinguish the meanings of words.
Their work is based on the theory that the meaning of a word can usually be gleaned from the words used around it. Take the word “rider”. Its meaning can be deduced from the fact that it is often found close to words like “horse” and “saddle”.
Instead relying on a common sense knowledge base such as Cyc, the reseachers use Google search to measure how closely two words relate to each other.
To do this, it needs to build a word tree - a database of how words relate to each other. It might start off with any two words to see how they relate to each other. For example, if it googles “hat” and “head” together it gets nearly 9 million hits, compared to, say, fewer than half a million hits for “hat” and “banana”. Clearly “hat” and “head” are more closely related than “hat” and “banana”.
To gauge just how closely, Vitanyi and Cilibrasi have developed a statistical indicator based on these hit counts that gives a measure of a logical distance separating a pair of words. They call this the normalised Google distance, or NGD. The lower the NGD, the more closely the words are related.
See also: “Google’s search for meaning“, New Scientist.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
November 10th, 2004, by Tim Finin, posted in Machine Learning, Agents
Xiaocheng Luan’s Ph.D. disseration on a quantitative approach to matching service requests against capability descriptions is now available on line.
Xiaocheng Luan, Adaptive Middle Agent for Service Matching in the Semantic Web: A Quantitative Approach, Ph.D. dissertation, Computer Science and Electrical Engineering, University of Maryland, Baltimore County, November 01, 2004.
In Dr. Luan’s approach, middle agents establish and refine an agent’s capability model based on the domain ontology and through the interactions with the agents. An agent’s performance history is considered as an integral part of the agent’s capability model and the agent’s strong and weak areas can also be revealed. The dynamically captured and updated service distribution in the service domain is considered as an important factor in service matching. Service matching here is carried out in two steps. In the first step, candidates are selected through the semantic service description matching. In the second step, the performance rating of each candidate with respect to the specific request is estimated based on the agent’s capability model, and the candidates with the highest estimated performance ratings will be selected. Statistics collected from evaluation experiments show a significant improvement over typical service matching methods in terms of the accuracy in selecting the best service provider(s) for each request.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  | You are currently browsing the archives for the Machine Learning category.
  Home
|
Archive
|
Login
|
Feed
Recent postsThe Psychology of Social Networking on KQED Forum showStudents: brand yourself with a blogSocial Data on the Web workshop at ISWC 2008Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBCGossip-Based Outlier Detection for Mobile Ad Hoc Networks
Ebiquity communityFieldmarking data blog
Geospatial Semantic Web
Harry Chen thinks aloud
Planet social media research
Social media research blog
TrackForward by Kolari
UMBC GAIM
|  |