<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>UMBC ebiquity &#187; cloud computing</title>
	<atom:link href="http://ebiquity.umbc.edu/blogger/category/cloud-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://ebiquity.umbc.edu/blogger</link>
	<description>EBB is the ebiquity research group\\\'s blog at the University of Maryland, Baltimore County (UMBC).  We focus on technologies that facilitate the design, implementation and control of distributed, intelligent information systems -- mobile and pervasive computing, ad hoc networking, multiagent systems, knowledge representation and reasoning, and the semantic web.  As the tides of technology ebb and flow, we hope the good ideas wash up on our beach and the bad ones drift back out to sea.</description>
	<lastBuildDate>Mon, 30 Jan 2012 02:42:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Make mincemeat out of MapReduce with Python</title>
		<link>http://ebiquity.umbc.edu/blogger/2011/10/01/make-mincemeat-out-of-mapreduce-with-python/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2011/10/01/make-mincemeat-out-of-mapreduce-with-python/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 17:03:31 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=4206</guid>
		<description><![CDATA[Tweetmincemeat.py is a super-lightweight, open source Python implementation of the popular MapReduce distributed computing framework that only depend on the Python Standard Library. Just install the single source file on a set of machines and invoke the script on them with a password (for authentication) and the IP address of the host and your workers [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton4206" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2011%2F10%2F01%2Fmake-mincemeat-out-of-mapreduce-with-python%2F&amp;text=Make%20mincemeat%20out%20of%20MapReduce%20with%20Python&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2011%2F10%2F01%2Fmake-mincemeat-out-of-mapreduce-with-python%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p><a href="http://remembersaurus.com/mincemeatpy/">mincemeat.py</a> is a super-lightweight, open source Python implementation of the popular <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> distributed computing framework that only depend on the Python Standard Library.</p>
<p>Just install the single source file on a set of machines and invoke the script on them with a password (for authentication) and the IP address of the host and your workers are good to go. Then, using the same package, run simple server program that defines map, reduce and your data source.</p>
<p>While it&#8217;s only 350 lines of Python, the package looks great for teaching or experimenting with the MapReduce concept as well as being potentially useful if you work in Python.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2011/10/01/make-mincemeat-out-of-mapreduce-with-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CloudCamp Baltimore, 6-10pm Wed Mar 9, 2011</title>
		<link>http://ebiquity.umbc.edu/blogger/2011/02/24/cloudcamp-baltimore-6-10pm-wed-mar-9-2011/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2011/02/24/cloudcamp-baltimore-6-10pm-wed-mar-9-2011/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 14:58:14 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=3969</guid>
		<description><![CDATA[TweetThere will be a free CloudCamp meeting in Baltimore from 6:000pm to 10:00pm Wednesday March 9th at the Baltimore Marriott Waterfront. Cloudcamps are participants-driven unconferences where users of Cloud Computing technologies meet to network and share ideas, experiences, challenges and solutions. The event is free but participants are asked to register to ensure there is [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton3969" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2011%2F02%2F24%2Fcloudcamp-baltimore-6-10pm-wed-mar-9-2011%2F&amp;text=CloudCamp%20Baltimore%2C%206-10pm%20Wed%20Mar%209%2C%202011&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2011%2F02%2F24%2Fcloudcamp-baltimore-6-10pm-wed-mar-9-2011%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p>There will be a free <a href="http://www.cloudcamp.org/baltimore">CloudCamp</a> meeting in Baltimore from 6:000pm to 10:00pm Wednesday March 9th at the Baltimore Marriott Waterfront.  Cloudcamps are participants-driven <a href="http://en.wikipedia.org/wiki/Unconference"><em>unconferences</em></a> where users of Cloud Computing technologies meet to network and share ideas, experiences, challenges and solutions.  The event is free but participants are asked to <a href="http://cloudcamp-baltimore.eventbrite.com/">register</a> to ensure there is enough food and refreshments.</p>
<p><img align="right" src="http://ebiquity.umbc.edu/blogger/wp-content/uploads/2011/02/logo_cloudcamp.gif" alt="CloudCamp" title="CloudCamp" width="154" height="35" />Here is the current, tentative schedule:</p>
<blockquote><p>
6:00pm &#8211; Registration &#038; Networking (food/drink) <br />
6:30pm &#8211; Opening Introductions <br />
6:45pm &#8211; Lightning Talks (5 minutes each) <br />
7:30pm &#8211; Unpanel <br />
8:00pm &#8211; Organize Unconference <br />
8:15pm &#8211; Unconference Breakout Session Round 1 <br />
9:00pm &#8211; Unconference Breakout Session Round 2 <br />
9:45pm &#8211; Wrap-up <br />
10:00pm &#8211; Find somewhere for post-event networking 
</p></blockquote>
<p>Contact the organizers if you are interested in giving a five minute lightning talk or lead breakout session.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2011/02/24/cloudcamp-baltimore-6-10pm-wed-mar-9-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UMBC hosts Frontiers of Multi-Core Computing Workshop</title>
		<link>http://ebiquity.umbc.edu/blogger/2010/09/11/umbc-hosts-frontiers-of-multi-core-computing-workshop/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2010/09/11/umbc-hosts-frontiers-of-multi-core-computing-workshop/#comments</comments>
		<pubDate>Sat, 11 Sep 2010 14:01:14 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=3410</guid>
		<description><![CDATA[TweetUMBC&#8217;s Multicore Computational Center will host the Second Workshop on Frontiers of Multi-Core Computing on 22-23 September 2010. The workshop will involve a wide range of people from universities, industry and government who will exchange ideas, discuss issues, and develop the strategies for coping with the challenges of parallel and multicore computing. &#8220;Multi- (e.g., Intel [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton3410" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2010%2F09%2F11%2Fumbc-hosts-frontiers-of-multi-core-computing-workshop%2F&amp;text=UMBC%20hosts%20Frontiers%20of%20Multi-Core%20Computing%20Workshop&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2010%2F09%2F11%2Fumbc-hosts-frontiers-of-multi-core-computing-workshop%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p>UMBC&#8217;s Multicore Computational Center will host the <a href="http://www.mc2.umbc.edu/workshops/fmc2.php">Second Workshop on Frontiers of Multi-Core Computing</a> on 22-23 September 2010. The workshop will involve a wide range of people from universities, industry and government who will exchange ideas, discuss issues, and develop the strategies for coping with the challenges of parallel and multicore computing.</p>
<blockquote><p> &#8220;Multi- (e.g., Intel Westmere and IBM Power7) and many-core (e.g., NVIDIA Tesla and AMD FireStream GPUs) microprocessors are enabling more compute- and data-intensive computation in desktop computers, clusters, and leadership supercomputers.  However efficient utilization of these microprocessors is still a very challenging issue.  Their differing architectures require significantly different programming paradigms when adapting real-world applications. The actual porting costs are actively debated, as well as the relative performance between GPUs and CPUs.&#8221;  </p></blockquote>
<p>The workshop is free but those interested should <a href="http://www.mc2.umbc.edu/RegistrationForm.html">register online</a>.  See the workshop <a href="http://www.mc2.umbc.edu/workshops/fmc2v2.pdf">schedule</a> for details on presentations and timing.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2010/09/11/umbc-hosts-frontiers-of-multi-core-computing-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UMBC Multicore Computational Center</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/06/15/umbc-multicore-computational-center/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/06/15/umbc-multicore-computational-center/#comments</comments>
		<pubDate>Mon, 15 Jun 2009 08:50:10 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1970</guid>
		<description><![CDATA[TweetJoab Jackson (UMBC &#8217;90) wrote a nice article on UMBC&#8217;s Multicore Computational Center for the current issue of UMBC Magazine. From The Power of Parallels: &#8220;In July 2007, IBM gave UMBC computer science professors Milton Halem and Yelena Yesha a grant to launch the center with cash and equipment that have totaled more than $1 [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1970" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F06%2F15%2Fumbc-multicore-computational-center%2F&amp;text=UMBC%20Multicore%20Computational%20Center&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F06%2F15%2Fumbc-multicore-computational-center%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p><a href="http://www.joabj.com/">Joab Jackson</a> (UMBC &#8217;90) wrote a nice article on UMBC&#8217;s <a href="http://www.mc2.umbc.edu/">Multicore Computational Center</a> for the current issue of UMBC Magazine. From <a href="http://www.umbc.edu/magazine/summer09/feature_power.html">The Power of Parallels</a>:</p>
<blockquote><p> &#8220;In July 2007, IBM gave UMBC computer science professors <a href="http://ebiquity.umbc.edu/person/html/Milton/Halem/">Milton Halem</a> and <a href="http://ebiquity.umbc.edu/person/html/Yelena/Yesha/">Yelena Yesha</a> a grant to launch the center with cash and equipment that have totaled more than $1 million over the past three years. Supporting funding from NASA also helped the effort.</p>
<ul> “Not only are we ahead of the curve,” says Charles Nicholas, chair of the department of computer science and electrical engineering, “but we hope to stay ahead of the curve&#8230;. The partnerships with IBM will let us keep the technologies up to date.”</ul>
<p>Halem says that government and private enterprise are in dire need of “trained graduate students who know how to apply the new methods of parallel programming to the problems they face,” Halem says. “We’re one of the few schools in the nation that is teaching these courses.”  </p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/06/15/umbc-multicore-computational-center/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tutorial: Hadoop on Windows with Eclipse</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/04/09/hadoop-on-windows-with-eclipse/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/04/09/hadoop-on-windows-with-eclipse/#comments</comments>
		<pubDate>Thu, 09 Apr 2009 16:35:06 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>
		<category><![CDATA[Multicore Computation Center]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Semantic Web]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1821</guid>
		<description><![CDATA[TweetHadoop has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don&#8217;t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems. UMBC Ph.D student Vlad Korolev has written an excellent tutorial, Hadoop on Windows with Eclipse, [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1821" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F04%2F09%2Fhadoop-on-windows-with-eclipse%2F&amp;text=Tutorial%3A%20Hadoop%20on%20Windows%20with%20Eclipse&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F04%2F09%2Fhadoop-on-windows-with-eclipse%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p><a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a> has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don&#8217;t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems.  </p>
<p>UMBC Ph.D student <a href="http://ebiquity.umbc.edu/person/html/Vladimir/Korolev/">Vlad Korolev</a> has written an excellent tutorial, <a href="http://ebiquity.umbc.edu/Tutorials/Hadoop/">Hadoop on Windows with Eclipse</a>, showing how to install and use Hadoop on a single computer running Microsoft Windows.  It also covers the Eclipse Hadoop plugin, which enables you to create and run Hadoop projects from Eclipse.  In addition to step by step instructions, the tutorial has short videos documenting the process.  </p>
<p>If you want to explore Hadoop and are comfortable developing Java programs in Eclipse on a Windows box, this tutorial will get you going.  Once you have mastered Hadoop and had developed your first project using it, you can go about finding a cluster to run it on.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/04/09/hadoop-on-windows-with-eclipse/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Map reduce on heterogeneous multicore clusters</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/04/07/map-reduce-on-heterogeneous-multi-core-clusters/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/04/07/map-reduce-on-heterogeneous-multi-core-clusters/#comments</comments>
		<pubDate>Tue, 07 Apr 2009 21:25:56 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1820</guid>
		<description><![CDATA[TweetIn tomorrow&#8217;s ebiquity meeting (10 am EDT Wed, April 8), PhD student David Chapman will talk about his work on Map Reduce on Heterogeneous Multi-Core Clusters. From the abstract: &#8220;We have extended the Map Reduce programming paradigm to clusters with multicore accelerators. Map Reduce is a simple programming programming model designed for parallel computations with [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1820" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F04%2F07%2Fmap-reduce-on-heterogeneous-multi-core-clusters%2F&amp;text=Map%20reduce%20on%20heterogeneous%20multicore%20clusters&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F04%2F07%2Fmap-reduce-on-heterogeneous-multi-core-clusters%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p>In tomorrow&#8217;s ebiquity meeting (10 am EDT Wed, April 8), PhD student <a href="http://ebiquity.umbc.edu/person/html/David/Chapman/">David Chapman</a> will talk about his work on <a href="http://ebiquity.umbc.edu/event/html/id/290/Map-Reduce-on-Heterogeneous-Multi-Core-clusters">Map Reduce on Heterogeneous Multi-Core Clusters</a>.  From the abstract:</p>
<blockquote><p> &#8220;We have extended the Map Reduce programming paradigm to clusters with multicore accelerators. Map Reduce is a simple programming programming model designed for parallel computations with large distributed datasets. Google has reinforced the practical effectiveness of this approach with over 1000 commercial Map Reduce applications. Typical Map Reduce implementations, such as Apache Hadoop exploit parallel file systems for use in homogeneous clusters. Unfortunately, the multicore accelerators such as Cell B.E. used in modern supercomputers such as Roadrunner require additional layers of parallelism, which cannot be addressed from parallel file systems alone. Related work has explored Map Reduce on a single Cell B.E. accelerator machine using hash and sort based techniques. We are incorporating techniques from Apache Hadoop as well as early multicore Map Reduce research to produce an implementation optimized for a hybrid multicore cluster. We are evaluating our implementation on a cluster of 24 of Cell Q series nodes, and and 48 multicore PowerPC J series nodes at the <a href="http://www.mc2.umbc.edu/">UMBC Multicore Computational Center</a>.&#8221;</p></blockquote>
<p>We will <a href="http://www.ustream.tv/channel/umbc-ebiquity-meeting">stream the talk</a> live and share the <a href="http://www.ustream.tv/recorded/1358142">raw recording</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/04/07/map-reduce-on-heterogeneous-multi-core-clusters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cloudera offers a simpler Hadoop distribution</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/03/18/cloudera-offers-a-simpler-hadoop-distribution/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/03/18/cloudera-offers-a-simpler-hadoop-distribution/#comments</comments>
		<pubDate>Wed, 18 Mar 2009 18:50:18 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>
		<category><![CDATA[Multicore Computation Center]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social media]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1811</guid>
		<description><![CDATA[Tweet We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday&#8217;s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1811" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F03%2F18%2Fcloudera-offers-a-simpler-hadoop-distribution%2F&amp;text=Cloudera%20offers%20a%20simpler%20Hadoop%20distribution&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F03%2F18%2Fcloudera-offers-a-simpler-hadoop-distribution%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p><a href="http://www.cloudera.com/distribution" border="0"><img src="http://www.cloudera.com/sites/default/files/download-ch-box-large.png" title="Download Cloudera's Hadoop distribution" align="right" border="0" /></a> We are early in the era of <i>big data</i> (including social and/or semantic) and more and more of us need the tools to handle it.  Monday&#8217;s NYT had a story, <a href="http://www.cloudera.com/">Hadoop, a Free Software Program, Finds Uses Beyond Search</a>, on Hadoop and <a href="http://www.cloudera.com/">Cloudera</a>, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.</p>
<blockquote><p> &#8220;In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.&#8221;<br />
&#8230;<br />
Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance.  The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.  </p></blockquote>
<p>Cloudera&#8217;s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide.  The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/03/18/cloudera-offers-a-simpler-hadoop-distribution/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>infochimps Amazon Machine Image for data analysis and viz</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/02/14/infochimps-amazon-machine-image-for-data-analysis-and-viz/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/02/14/infochimps-amazon-machine-image-for-data-analysis-and-viz/#comments</comments>
		<pubDate>Sat, 14 Feb 2009 14:05:34 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social media]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1777</guid>
		<description><![CDATA[TweetInfochimps has registered a community image for Amazon&#8217;s Elastic Compute Cloud (EC2) designed for data processing, analysis, and visualization. Great idea! Doing experimental computer science research requires the right infrastructure &#8212; hardware, bandwidth, software environments and data &#8212; and tacking some interesting problems requires a lot. Cloud computing services, such as EC2, are a great [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1777" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F02%2F14%2Finfochimps-amazon-machine-image-for-data-analysis-and-viz%2F&amp;text=infochimps%20Amazon%20Machine%20Image%20for%20data%20analysis%20and%20viz&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F02%2F14%2Finfochimps-amazon-machine-image-for-data-analysis-and-viz%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p><a href="http://infochimps.org/">Infochimps</a> has registered a community image for Amazon&#8217;s <a href="http://aws.amazon.com/ec2">Elastic Compute Cloud</a> (EC2) designed for data processing, analysis, and visualization.  Great idea!</p>
<p>Doing experimental computer science research requires the right infrastructure &#8212; hardware, bandwidth, software environments and data &#8212; and tacking some interesting problems requires a lot.  Cloud computing services, such as EC2, are a great boon to researchers who aren&#8217;t part of a well equipped lab already set up to support just the kind of research you want to do.</p>
<p>EC2 allows users to instantiate a virtual computer from a saved image, called an <a href="http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=171">Amazon Machine Image</a>, or AMI.  Users can configure a system with the with the operating system, software packages, and pre-loaded data they want and then save it as a shared community AMI, making it available to others.</p>
<p>The initial announcement, <a href="http://blog.infochimps.org/2009/01/28/hacking-through-the-amazon-with-a-shiny-new-machetec2/">Hacking through the Amazon with a shiny new MachetEC2</a>, says</p>
<blockquote><p> &#8220;MachetEC2 is an effort by a group of Infochimps to create an AMI for data processing, analysis, and visualization. If you create an instance of MachetEC2, you’ll be have an environment with tools designed for working with data ready to go. You can load in your own data, grab one of our datasets, or try grabbing the data from one of Amazon’s <a href="http://aws.amazon.com/publicdatasets/">Public Data Sets</a>. No matter what, you’ll be hacking in minutes.<br />
&#8230;<br />
We’re taking suggestions for what software the community would be most interested in having installed on the image &#8230;  When we feel that the AMI is getting too bloated, we’ll split it up: MachetEC2-ML (machine learning), MachetEC2-viz, MachetEC2-lang, MachetEC2-bio, etc.&#8221;  </p></blockquote>
<p>And a second post gave some more details:</p>
<blockquote><p> &#8220;When you SSH into an instance of machetEC2 (brief instructions after the jump), check the <code>README</code> files: they describe what&#8217;s installed, how to deal with volumes and Amazon Public Datasets, and how to use X11-based applications.  You can also visit the the <a href="http://github.com/infochimps/machetec2/tree/master">machetEC2 GitHub page</a> to see the full <a href="http://github.com/infochimps/machetec2/blob/master/config/packages.yaml">list of packages installed</a>, the <a href="http://github.com/infochimps/machetec2/blob/master/config/gems.yaml">list of gems</a>, and the list of <a href="http://github.com/infochimps/machetec2/tree/master/sources">programs installed from source</a>.</p>
<p>To launch an instance of machetEC2, log into the <a href="https://console.aws.amazon.com/">AWS Console</a>, click &#8220;AMIs&#8221;, search for &#8220;machetEC2&#8243; or <code>ami-29ef0840</code>, and click &#8220;Launch&#8221;.  If you&#8217;re on the command-line, simply run</p>
<ul><code> $ ec2-run-instances ami-29ef0840 -k [your-keypair-name]</code></ul>
<p>By the time you&#8217;ve grabbed some coffee, you&#8217;ll be able to access an EC2 instance with all the tools you need for working with data already installed, configured, and ready to hack.&#8221;  </p></blockquote>
<p>This is a valuable contribution to the data wrangling community and to the larger research community as an example of what can be done.  I can imagine similar community AMIs to support research on the Semantic Web, social network analyss,  game development or multi-agent systems.</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/02/14/infochimps-amazon-machine-image-for-data-analysis-and-viz/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hadoop user group for the Baltimore-DC region</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/02/08/hadoop-user-group-for-the-baltimoredc-region/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/02/08/hadoop-user-group-for-the-baltimoredc-region/#comments</comments>
		<pubDate>Sun, 08 Feb 2009 15:10:17 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[High performance computing]]></category>
		<category><![CDATA[MC2]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1764</guid>
		<description><![CDATA[TweetA Hadoop User Group (HUG) has formed for the Washington DC area via meetup.com. &#8220;We&#8217;re a group of Hadoop &#038; Cloud Computing technologists / enthusiasts / curious people who discuss emerging technologies, Hadoop &#038; related software development (HBase, Hypertable, PIG, etc). Come learn from each other, meet nice people, have some food/drink.&#8221; The group defines [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1764" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F02%2F08%2Fhadoop-user-group-for-the-baltimoredc-region%2F&amp;text=Hadoop%20user%20group%20for%20the%20Baltimore-DC%20region&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F02%2F08%2Fhadoop-user-group-for-the-baltimoredc-region%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p>A <a href="http://www.meetup.com/Hadoop-DC/">Hadoop User Group</a> (HUG) has formed for the Washington DC area via meetup.com.</p>
<blockquote><p> &#8220;We&#8217;re a group of <a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a> &#038; <a href="http://en.wikipedia.org/wiki/Cloud_computing">Cloud Computing</a> technologists / enthusiasts / curious people who discuss emerging technologies, Hadoop &#038; related software development (<a href="http://hadoop.apache.org/hbase/">HBase</a>, <a href="http://hypertable.org/">Hypertable</a>, <a href="http://hadoop.apache.org/pig/">PIG</a>, etc). Come learn from each other, meet nice people, have some food/drink.&#8221; </p></blockquote>
<p>The group defines it&#8217;s geographic location as Columbia MD and their first <a href="http://www.meetup.com/Hadoop-DC/messages/boards/thread/6218422">HUG meetup</a> was held last Wednesday at the BWI Hampton Inn.  In addition to informal social interactions, it featured two presentations:</p>
<ul>
<li> Amir Youssefi from Yahoo! presented an overview of Hadoop. Amir is a member of the Cloud Computing and Data Infrastructure group at Yahoo!, and will be discussing Multi-Dataset Processing (Joins) using Hadoop and Hadoop Table.</li>
<li> Introduction to complex, fault tolerant data processing workflows using Cascading and Hadoop by Scott Godwin &#038; Bill Oley</li>
</ul>
<p>If you&#8217;re in Maryland and interested you can join the group at <a href="http://www.meetup.com/Hadoop-DC/">meetup.com</a> and get announcements for future meetings.  It might provide a good way to learn more about new software to exploit computing clusters and cloud computing.</p>
<p>(Thanks to <a href="http://www.cpdiehl.org/">Chris Diehl</a> for alerting me to this)</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/02/08/hadoop-user-group-for-the-baltimoredc-region/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>octo.py: quick and easy MapReduce for Python</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/#comments</comments>
		<pubDate>Fri, 02 Jan 2009 17:34:37 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[MC2]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[distributed computing needs]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1718</guid>
		<description><![CDATA[TweetThe amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever. That&#8217;s why everyone is interested in easy to use frameworks like MapReduce for every-day programmers to run their [...]]]></description>
			<content:encoded><![CDATA[<div id="tweetbutton1718" class="tw_button" style="clear:left; float: left; margin-right: 10px; margin-top:10px; margin-left: -80;float:left;margin-right:10px;"><a href="http://twitter.com/share?url=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F01%2F02%2Foctopy-quick-and-easy-mapreduce-for-python%2F&amp;text=octo.py%3A%20quick%20and%20easy%20MapReduce%20for%20Python&amp;related=ebiquity&amp;lang=en&amp;count=vertical&amp;counturl=http%3A%2F%2Febiquity.umbc.edu%2Fblogger%2F2009%2F01%2F02%2Foctopy-quick-and-easy-mapreduce-for-python%2F" class="twitter-share-button"  style="width:55px;height:22px;background:transparent url('http://ebiquity.umbc.edu/blogger/wp-content/plugins/wp-tweet-button/tweetn.png') no-repeat  0 0;text-align:left;text-indent:-9999px;display:block;">Tweet</a></div><p>The amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever.  That&#8217;s why everyone is interested in easy to use frameworks like <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> for every-day programmers to run their data crunching in parallel.</p>
<p><a href="http://code.google.com/p/octopy/">octo.py</a> is a very simple MapReduce like system inspired by Ruby&#8217;s <a href="http://tech.rufy.com/2006/08/mapreduce-for-ruby-ridiculously-easy.html"> Starfish</a>.</p>
<blockquote><p>
&#8220;<a href="http://code.google.com/p/octopy/">Octo.py</a> doesn&#8217;t aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there&#8217;s a good chance that you can make it distributed with just a few small changes. If you&#8217;re already using Python&#8217;s map() and reduce() functions, the changes needed are trivial!&#8221;
</p></blockquote>
<p>triangular.py is the simple example given in the documentation that is used with octo.py to compute the first 100 <a href="http://wikipedia.org/wiki/Triangular_number">triangular numbers</a>.</p>
<blockquote>
<pre>
# triangular.py compute first 100 triangular numbers. Do
# 'octo.py server triangular.py' on server with address IP
# and 'octo.py client IP' on each client. Server uses source
# &#038; final, sends tasks to clients, integrates results. Clients
# get tasks from server, use mapfn &#038; reducefn, return results.

source = dict(zip(range(100), range(100)))

def final(key, value):
    print key, value

def mapfn(key, value):
    for i in range(value + 1):
        yield key, i

def reducefn(key, value):
    return sum(value)
</pre>
</blockquote>
<p>Put <a href="http://ebiquity.umbc.edu/blogger/wp-content/uploads/2009/01/octo.py">octo.py</a> on all of the machines you want to use. On the machine you will use as a server (with ip address &lt;ip&gt;), also install <a href="http://ebiquity.umbc.edu/blogger/wp-content/uploads/2009/01/triangular.py"> triangular.py</a>, and then execute:</p>
<pre>
     python octo.py server triangular.py &amp;
</pre>
<p>On each of your clients, run </p>
<pre>
     python octo.py client &lt;ip&gt; &amp;
</pre>
<p>You can try this out using the same machine to run the server process and one or more client processes, of course.</p>
<p>When the clients register with the server, they will get a copy of <em>triangular.py</em> and wait for tasks from the server.  The server access the data from <em>source</em> and distributed tasks to the clients. These in turn use <em>mapfn</em> and <em>reducefn</em> to complete the tasks, returning the results.  The server integrates these and, when all have completed, invokes <em>final</em>, which in this case just prints the answers, and halts.  The clients continue to run, waiting for more tasks to do. </p>
<p>Octo.py is not a replacement for more sophisticated frameworks like Hadoop or Disco but if you are working in Python, its <a href="http://en.wikipedia.org/wiki/KISS_principle">KISS</a> approach is a good way to get started with the MapReduce paradigm and might be all you need for a small projects.</p>
<p>(Note: The package has not been updated since April 2008, so it&#8217;s status is not clear.  But further development would run the risk of making it more complex and would be self-defeating.)</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

