 | UMBC eBiquity Blog 
Tim Finin, 11:01am 9 February 2009
I thought this cartoon from a recent issue of the New Yorker offers an accurate comment on modern life.
Categories: GENERAL, Humor,
,
Related posts: • Blog comment spam magnet; • Invisible phone or invisible friend?; • I DID NOT feel good about this comment; Comments: none
Tim Finin, 10:49am 9 February 2009
The US Senate’s stimulus plan released at the end of last week has less money for US science agencies than the House plan from January, but the cuts were not as drastic as were feared. CRA reports in a post Senate Deal Protects Much of NSF Increase in Stimulus that
“The agreement does reduce the increase in the Department of Energy’s Office of Science by $100 million (so, +$330 million instead of +$430 million), and NIST’s increase would be reduced by $100 million (so +$495 million instead of +$595 million). But given the reports we were receiving as recently as yesterday evening about the possibility of no increase for the science agencies in the bill, this is a remarkable turn of events. The increase for NSF in the Senate bill will still be far less than the $3 billion called for in the House version of the bill, but NSF will be in far better shape in the conference between the two chambers coming in with $1.2 billion from the Senate instead of zero.”
Scientists and Engineers for America (a 501(c)(3) organization) has a detailed breakdown of the the stimulus package that passed the Senate Friday in Senate-passed stimulus package by the numbers. They also have a downloadable excel spreadsheet in case you want to crunch the data yourself. Here are some science highlights from their post:
NSF Research: $1.2 billion total for NSF including: $1 billion to help America compete globally; $150 million for scientific infrastructure; and $50 million for competitive grants to improve the quality of science, technology, engineering, and mathematics (STEM) education.
NASA: $1.3 billion total for NASA including: $450 million for Earth science missions to provide critical data about the Earth’s resources and climate; $200 million to enable research and testing of environmentally responsible aircraft and for verification and validation methods for complex aerospace systems and software; $450 million to reduce the gap in time that the U.S. does not have a vehicle to access the International Space Station; and $200 million for repair, upgrade and construction at NASA facilities.
NOAA: $1 billion total for NOAA, including $645 million to construct and repair NOAA facilities, equipment and vessels to reduce the Nation’s coastal charting backlog, upgrade supercomputer infrastructure for climate research, and restore critical habitat around the Nation.
NIST: $475 million total for NIST including: $307 million for renovation of NIST facilities and new laboratories using green technologies; $168 million for scientific and technical research at NIST to strengthen the agency’s IT infrastructure; provide additional NIST research fellowships; provide substantial funding for advanced research and measurement equipment and supplies; increase external grants for NIST-related research.
DOE: The Department of Energy’s Science program sees $330 million for laboratory infrastructure and construction.
Categories: GENERAL, Policy,
Tags: Funding; NSF,
Related posts: • US House stimulus plan: NSF += $3B; • Senate Cuts DARPA Cognitive Computing program; • Stimulus Watch: propose and vote on shovel ready projects; Comments: none
Tim Finin, 12:50pm 8 February 2009
The Data Evolution blog has an interesting post that asks Is Big Data at a tipping point?. It’s suggests that we may be approaching a tipping point in which large amounts of online data will be interlinked and connected to suddenly produce a whole much larger than the parts.
“For the past several decades, an increasing number of business processes– from sales, customer service, shipping – have come online, along with the data they throw off. As these individual databases are linked, via common formats or labels, a tipping point is reached: suddenly, every part of the company organism is connected to the data center. And every action — sales lead, mouse click, and shipping update — is stored. The result: organizations are overwhelmed by what feels like a tsunami of data. The same trend is occurring in the larger universe of data that these organizations inhabit. Big Data unleashed by the “Industrial Revolution of Data”, whether from public agencies, non-profit institutes, or forward-thinking private firms.”
I expected that the post would soon segue into a discussion of the Semantic Web and maybe even the increasingly popular linked data movement, but it did not. Even so, it sets up plenty of nails for which we have a an excellent hammer in hand. I really like this iceberg analogy, by the way.
“At present, much of the world’s Big Data is iceberg-like: frozen and mostly underwater. It’s frozen because format and meta-data standards make it hard to flow from one place to another: comparing the SEC’s financial data with that of Europe’s requires common formats and labels (ahem, XBRL) that don’t yet exist. Data is “underwater” when, whether reasons of competitiveness, privacy, or sheer incompetence it’s not shared: US medical records may contain a wealth of data, but much of it is on paper and offline (not so in Europe, enabling studies with huge cohorts).”
The post also points out some sources of online data and analysis tools, some familiar and some new to me (or maybe just forgotten.)
“Yet there’s a slow thaw underway as evidenced by a number of initiatives: Aaron Swartz’s theinfo.org, Flip Kromer’s infochimps, Carl Malamud’s bulk.resource.org, as well as Numbrary, Swivel, Freebase, and Amazon’s public data sets. These are all ambitious projects, but the challenge of weaving these data sets together is still greater.”
Categories: Datamining, Semantic Web,
,
Related posts: • Joel Sachs on Linked Data, 10:30am Oct 1, ITE 325b; • New York Times publishes Linked Open Data; • Tim Berners-Lee talks at TED 2009 on linked data; Comments: 2
Tim Finin, 10:10am 8 February 2009
A Hadoop User Group (HUG) has formed for the Washington DC area via meetup.com.
“We’re a group of Hadoop & Cloud Computing technologists / enthusiasts / curious people who discuss emerging technologies, Hadoop & related software development (HBase, Hypertable, PIG, etc). Come learn from each other, meet nice people, have some food/drink.”
The group defines it’s geographic location as Columbia MD and their first HUG meetup was held last Wednesday at the BWI Hampton Inn. In addition to informal social interactions, it featured two presentations:
- Amir Youssefi from Yahoo! presented an overview of Hadoop. Amir is a member of the Cloud Computing and Data Infrastructure group at Yahoo!, and will be discussing Multi-Dataset Processing (Joins) using Hadoop and Hadoop Table.
- Introduction to complex, fault tolerant data processing workflows using Cascading and Hadoop by Scott Godwin & Bill Oley
If you’re in Maryland and interested you can join the group at meetup.com and get announcements for future meetings. It might provide a good way to learn more about new software to exploit computing clusters and cloud computing.
(Thanks to Chris Diehl for alerting me to this)
Categories: Database, High performance computing, MC2, cloud computing,
,
Related posts: • Tutorial: Hadoop on Windows with Eclipse; • Cloudera offers a simpler Hadoop distribution; • New Geospatial Semantic Web Group; Comments: none
Tim Finin, 9:47pm 6 February 2009
Tim Berners-Lee gave a talk at the TED2009 conference on linked data — one of the newest and most interesting ideas to emerge from efforts to realize the Semantic Web vision.
Here’s a summary of Sir Berners-Lee’s from a post by Gigaom, Highlights from TED: Tim Berners-Lee, Pattie Maes, Jacek Utko. I’m looking forward to being able to see his talk online soon.
“Founder of the web Tim Berners-Lee spoke of the next grassroots communication movement he wants to start: linked data. Much in the way his development of the web stemmed out of the frustrations of brilliant people working in silos, he is frustrated that the data of the world is shut apart in offline databases.
Berners-Lee wants raw data to come online so that it can be related to each other and applied together for multidisciplinary purposes, like combining genomics data and protein data to try to cure Alzheimer’s. He urged “raw data now,” and an end to “hugging your data” — i.e. keeping it private — until you can make a beautiful web site for it.
Berners-Lee said his dream is already on its way to becoming a reality, but that it will require a format for tagging data and understanding relationships between different pieces of it in order for a search to turn up something meaningful. Some current efforts are dbpedia, a project aimed at extracting structured information from Wikipedia, and OpenStreetMap, an editable map of the world. He really wants President Obama, who has promised to conduct government transparently online, to post linked data online.”
You can see the slides that TBL used on the W3C site.
Categories: Semantic Web,
Tags: Tim-Berners Lee,
Related posts: • Video from Tim Berners-Lee 2009 TED talk on linked data; • RPI exports data.gov information as linked data; • Semantic Web for Industry; Comments: 3
Tim Finin, 12:24pm 6 February 2009
Next week we will see another epochal event, like Y2K and 2012. Actually, we hope it will be more like Y2K and less like the predicted events on 21 December 2012, which some think will bring about the end of the world as we know it.
At 23:31:30 GMT on Friday 13 February 2009 the Unix time will be 1234567890.
This is, of course, the number of seconds since midnight GMT on 1 January 1970 (not counting leap seconds.) To keep track of the event, you can use the epoch clock or just ask your local Unix system date ‘+%s’.
Categories: GENERAL,
,
Related posts: • New CS grad enrollements in US up slightly in 2005; • testing; • Top 100 gadgets of all time; Comments: none
Tim Finin, 1:03pm 5 February 2009
People who’s native language is Perl might find the Perl/Python phrasebook handy. When talking to the Python interpreter, some try hand gestures, typing slowly or using ALL CAPS, but these seldom work and can often annoy or even alarm the interpreter. This phrasebook covers the most common things you need to say to a simple Python system. For example, if you wanted to tell it to read your file as a list of lines, there’s a phrasebook entry that that shows just how to say it.
my $filename = “cooktest1.1-1″;
open my $f, $filename or die “can’t open $filename: $!\n”;
@lines = <$f>;
–
filename = “cooktest1.1-1″
f = open(filename) # Python has exceptions with somewhat-easy to
# understand error messages. If the file could
# not be opened, it would say “No such file or
# directory: %filename” which is as
# understandable as “can’t open $filename:”
lines = f.readlines()
Many of the entries also contain helpful facts and advice about the customs and social norms of native Python speakers. Not only can this keep you out of trouble, it will deepen your understanding of the colorful and sometimes quaint Python speakers. I hope that the pocket travel version of the phrasebook, suitable for downloading onto an ipod, will be out soon.
Categories: Programming,
Tags: Perl; Python,
Related posts: • No related posts; Comments: none
Tim Finin, 12:41pm 5 February 2009
I thought the informal statistics from mentioned in this short USA Today story, Facebook friends share ‘25 Things’ with the world, were interesting. (Emphasis added)
“If you are a member of the 150-million-strong Facebook nation, you have probably learned some fascinating — or, let’s face it, some not-so-fascinating — facts about your friends as part of the latest fad, the pass-it-forward viral game “25 Random Things About Me.”
…
The phenomenon continues to snowball. Facebook can’t quantify activity specific to 25 Things as it does applications such as Flixster. But spokeswoman Brandee Barker says that over the past week the number of daily “notes” has more than doubled and the number of daily tags of a Facebook member in a note has grown by five times. “I would say that anecdotally I’ve never seen a note spread as quickly as this has on Facebook,” Barker says. “What is really unique about this is it’s a really meaningful piece of content. Some of the these notes are touching and frankly very insightful.”
Yesterday’s NYT also had a story on the fad, Ah, Yes, More About Me? Here Are ‘25 Random Things’.
As internet fads go, it probably has not yet peaked. Possible evidence is that there isn’t a Wikipedia article on the phenomenon yet, or even a mention of it on the its List of Internet Phenomena. So there is still time to get in and be cool.
Categories: Social media,
Tags: Facebook,
Related posts: • Is Web 2.0 another bubble?; • Aether Systems, once Baltimore’s dot.com favorite, leaving city; • Wall Street’s collapse may be IT program’s gain; Comments: one
Tim Finin, 4:54pm 4 February 2009
IDG news service has a story sketching how Google Researcher Targets Web’s Structured Data. This is not directed at data published in machine understandable form (e.g., in RDF), but on other kinds of structured data accessible on the web.
“Internet search engines have focused largely on crawling text on Web pages, but Google is knee-deep in research about how to analyze and organize structured data, a company scientist said Friday. “There’s a lot of structured data out on the Web and we’re not doing a good job of presenting it to our users,” said Alon Halevy during a talk at the New England Database Day conference at the Massachusetts Institute of Technology,
Halevy was referring in part to so-called “deep Web” sources, such as the databases that sit behind form-driven Web sites like Cars.com or Realtor.com. Google has been submitting queries to various forms for some time, retrieving the resulting Web pages and including them in its search index if the information looks useful.
But the company also wants to analyze the data found in structured tables on many Web sites, Halevy said, offering as an example a table on a Web page that lists the U.S. presidents. And there are reams of those tables — Google’s index turned up 14 billion of them, according to Halevy. He “realized very quickly that over 98 percent of these are not that interesting,” but even after significant filtering there remain about 154 million tables worth indexing, he said.
ReadWriteWeb also has a story (Google: “We’re Not Doing a Good Job with Structured Data”)on that Google is or isn’t doing with structured data, including an interesting admission by Google researcher Halevy.
“During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google’s Alon Halevy admitted that the search giant has “not been doing a good job” presenting the structured data found on the web to its users. By “structured data,” Halevy was referring to the databases of the “deep web” – those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.”
For some technical details on the issues and current work, see the paper Google’s DeepWeb Crawl by researchers from Google (including Halevy), UCSD and Cornell published in the Proceedings of VLDB 2009.
Categories: Google, Semantic Web,
,
Related posts: • Google supports RDFa and Microformats; • Yahoo! adds RDF support to SearchMonkey and BOSS; • Solvent helps extract data on web pages and materialize it as RDF; Comments: none
Tim Finin, 10:30am 4 February 2009
Dow Jones is hosting a free one hour webinar about the Semantic Web, on Thursday 12 February 2009 at 10:00am and again at 2:00pm EST. The webinar, The Semantic Web: Discover, Determine and Deploy, is the first in a tree-part series on the Semantic Web. I’ll be interested to see how this is presented for what I assume is a very pragmatic, business-oriented audience.
“Dow Jones notes that “these days it’s critical for organizations to consume, digest, and share news and information. The Semantic Web is no longer ahead of its time and is rapidly changing how organizations keep up with information overload.” This webinar is Part I of a series and in it you will learn how Semantic Web Technologies enable you to re-use valuable information to save costs, facilitate easier collaboration and sharing of critical information across your business, and increase search relevancy and surface the most valuable information needed to remain competitive.”
The presenters are Christine Connors and Daniela Barbosa , both members of the Dow Jones Enterprise Media Group. The webinar is free but requires registration.
Spotted on ReadWriteWeb.
Categories: Semantic Web,
,
Related posts: • Google earth crowdsourcing map data; • ISWC 2008 tutorial program set; • Talis starts Nodalities magazine devoted to the Semantic Web; Comments: one
Tim Finin, 11:16pm 1 February 2009
Feather Tether is one of the games produced by participants at the UMBC Global Game Jam site.
“Two birds race to the moon and while a tether holds them together. These conflicting buddies must stay together and protect the tether by all means, yet compete to win. To achieve the ultimate goal both players should cooperate to survive their adventure to the moon.”
This was the top-ranked game that came out of the UMBC site based on the votes of the participants.
It’s a five minute flash game that you can download it and play on almost any platform. As an added bonus if you do download it, you’ll get the source! After playing the game, you can rate it on the Global Game Jam site.
Categories: GENERAL,
,
Related posts: • UMBC hosts Baltimore site for two day Global Game Jam; • UMBC to host 2009 Global Game Jam site; • Global Game Jam at UMBC, January 29-31; Comments: none
|  |
|  |