UMBC ebiquity
2011 September

Archive for September, 2011

talk: Genetic information for chronic disease prediction, 1pm 9/23, ITE227, UMBC

September 22nd, 2011, by Tim Finin, posted in AI, Machine Learning

Genetic information for chronic disease prediction

Michael A. Grasso, MD, PhD
University of Maryland School of Medicine

1:00pm Friday 23 September 2011, 227 ITE

Type 2 diabetes and coronary artery disease are commonly occurring polygenic-multifactorial diseases, which are responsible for significant morbidity and mortality. The identification of people at risk for these conditions has historically been based on clinical factors alone. However, this resulted in prediction algorithms that are linked to symptomatic states, which have limited accuracy in asymptomatic individuals. Advances in genetics have raised the hope that genetic testing may aid in disease prediction, treatment, and prevention. Although intuitive, the addition of genetic information to increase the accuracy of disease prediction remains an unproven hypothesis. We present an overview of genetic issues involved in polygenic-multifactorial diseases, and summarize ongoing efforts use this information for disease prediction.

Michael Grasso is an Assistant Professor of Internal Medicine and Emergency Medicine at the University of Maryland School of Medicine, and an Assistant Research Professor of Computer Science at the University of Maryland Baltimore County. He earned a medical degree from the George Washington University and a PhD in Computer Science from the University of Maryland. He is a member of the Upsilon Pi Epsilon Honor Society in the Computing Sciences, the Kane-King-Dodec Medical Honor Society, and the William Beaumont Medical Research Honor Society. He completed a residency at the University of Maryland School of Medicine, and currently works in the Department of Emergency Medicine at the University of Maryland Medical Center. He has been awarded more than $1,200,000 in grant funding from the National Institutes of Health, the National Bureau of Standards and Technology, and the Department of Defense, and has authored more than 35 scholarly papers and abstracts. His research interests include clinical decision support systems, clinical data mining, clinical image processing, personalized medicine, software engineering, database engineering, and human factors. He is also a semi-professional trumpet player and is interested in the specific medical needs of performing artists, especially instrumental musicians.

Host: Yelena Yesha

Programming with Hadoop: a hands on introduction

September 20th, 2011, by Tim Finin, posted in High performance computing

In this week’s ebiquity meeting (10:30am Tue 9/20 in ITE 325b) we will dive right into writing MapReduce programs, and we skip all the gory details about Hadoop setup and MapReduce theory. In one hour, we will write a MapReduce Java program using Eclipse to create an inverted-index, test it on a local box, and run it on an already set up Hadoop cluster. If we have time, we will also see how to do the same using Python instead of Java.

You are encouraged to do the following before the meeting if you want to code along.

  • Review the Yahoo Introduction to MapReduce tutorial
  • Download a free virtual machine image with Hadoop pre-installed, so you can get started quickly. Options are available for Linux, Windows and Mac OS X.
  • Make sure you have JDK 1.6x and Eclipse (or your favourite IDE) installed on your laptop.

Addenda (9/19):

  • If you are planning to code along during the demo, download the latest stable release of Hadoop (0.20.2)
  • Some people have been having problems with Cloudera’s 64 bit VM image. If you do, try this virtual machine from Yahoo Developer Network that contains a pre-installed hadoop 0.20.
  • Even if you are not able to get the VM running for now, you can still run the program(s) locally on your laptop using Eclipse.

Ten years of words from ebiquity papers

September 16th, 2011, by Tim Finin, posted in Ebiquity, NLP, Semantic Web

Here’s a word cloud that visualizes the 200 most significant words extracted from over 400 papers from our research group over the past ten years. Significance was estimated by tf-idf where the idf data is from a collection of newswire articles (thanks Paul!). The word cloud was created with Wordle.

Got a problem? There’s a code for that

September 15th, 2011, by Tim Finin, posted in Google, KR, Ontologies, OWL, Semantic Web, Social media

The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.

“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”

We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).

Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).

Detecting fake Google+ profiles with image search

September 11th, 2011, by Tim Finin, posted in Machine Learning, Semantic Web, Social media

Many Google+ users have been reporting frequent notices about new followers that they don’t know and appear to be attractive young women. The suspicious followers have minimal profiles and no posts. These are obviously false accounts being created for some yet unknown purpose, but how can one prove it?

I just got a notice, for example, that Janet Smith of Philadelphia is following me. Now Janet Smith is a common name and Philadelphia is a big place — there are probably hundreds of people who live in the Philadelphia area with that name. The 990 other people she’s following seem like a pretty random bunch, though I do know many and have more than a few in my own circles. Most seem to have a fair number of followers.

So there is not much to go on other than her profile image. This is a great use for Google’s new image search. I dragged the picture into the image search query field and Google identified its best guess for the image as Indian actress Koyel Mullick. Sure enough, if you search for images with her name, the precise Janet Smith image is result number 15.

Of course, there are still some subtle issues. This is just one kind of false profile — one created for one identity but using an image from a different one. It’s common on most social media systems, including G+, for some people to use a picture of someone or something other than themselves. But it’s obvious to a human viewer that using a picture of a rabbit, Marilyn Monroe or the mighty Thor on your profile is not meant to deceive. It will be challenging to automate the process of discriminating the intent to deceive from modesty, homage or an ironic gesture.

Mid-Atlantic student colloquium on speech, language and learning

September 2nd, 2011, by Tim Finin, posted in AI, Conferences, KR, Machine Learning, NLP

The First Mid-Atlantic Student Colloquium on Speech, Language and Learning is a one-day event to be held at the Johns Hopkins University in Baltimore on Friday, 23 September 2011. Its goal is to bring together students taking computational approaches to speech, language, and learning, so that they can introduce their research to the local student community, give and receive feedback, and engage each other in collaborative discussion. Attendance is open to all and free but space is limited, so online registration is requested by September 16. The program runs from 10:00am to 5:00pm and will include oral presentations, poster sessions, and breakout sessions.

Journal of web semantics issue on evaluation

September 2nd, 2011, by Tim Finin, posted in Semantic Web

Call For Papers

Special Issue on Evaluation of Semantic Technologies
Journal of Web Semantics

Semantic technologies have become a well-established field of computer science. However, the field is continuously evolving: the number of semantic technologies is constantly increasing, standards evolve and new ones are defined; and, in this scenario, the problem of how to compare and evaluate the various approaches becomes crucial. The consistent evaluation of semantic technologies is critical not only for future scientific progress, by identifying research goals and allowing a rigorous examination of research results, but also for their industrial adoption, by allowing objective measurement and comparison of these technologies and enabling their certification.

Semantic technology evaluation must, on the one hand, be supported by strong methodological approaches and relevant test data and, on the other hand, satisfy the differing needs of developers, researchers and adopters by addressing those quality characteristics that are relevant to each target group. Nevertheless, numerous issues must be faced when evaluating semantic technologies.

On the one hand, because of the fast evolution of the semantic field, previous evaluation methods and techniques need to be adapted and extended and new ones have to be developed. On the other hand, the cost of defining new evaluations methods or reusing existing ones can be prohibitive, so facilitating the understanding of such methods or their automated processing becomes highly significant.

The goal of this special issue is to present current advances and trends in semantic technology evaluation (theories and models, methods and techniques, evaluation campaigns, technology comparison, etc.). Therefore we solicit papers that improve evaluation paradigms of semantic technologies. At the same time papers that evaluate a particular method, technology or system without investigating the evaluation regime itself will be considered out of scope and will be returned to the authors with no review.

Topics of interest

Relevant topics for the special issue include, but are not limited to, the following.

  • Semantic technology evaluation methods
  • Test data for semantic technology evaluation
  • Automation of semantic technology evaluation
  • Evaluation of semantic technologies in real world scenarios
  • Evaluation of linked data technologies
  • Quality requirements for semantic technologies
  • Semantic technology certification
  • Maturity models for semantic technologies
  • Semantic technology selection
  • Semantic technology quality estimation
  • Interoperability and conformance of semantic technologies
  • Semantic technology efficiency and scalability
  • Usability of semantic technologies

Important dates

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. To this end, we encourage submissions well before the submission deadline.

  • Submission deadline. 29 February 2012
  • Author notification. 31 May 2012
  • Final version. 31 July 2012
  • Publication. Fall 2012

Instructions for submission

Please see the author guidelines for detailed instructions before you submit. Submissions should be conducted through Elsevier’s Electronic Submission System. More details on the Journal of Web Semantics can be found on its homepage.


You are currently browsing the UMBC ebiquity weblog archives for September, 2011.

  Home | Archive | Login | Feed