<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>UMBC ebiquity &#187; distributed computing needs</title>
	<atom:link href="http://ebiquity.umbc.edu/blogger/tag/distributed-computing-needs/feed/" rel="self" type="application/rss+xml" />
	<link>http://ebiquity.umbc.edu/blogger</link>
	<description>EBB is the ebiquity research group\\\'s blog at the University of Maryland, Baltimore County (UMBC).  We focus on technologies that facilitate the design, implementation and control of distributed, intelligent information systems -- mobile and pervasive computing, ad hoc networking, multiagent systems, knowledge representation and reasoning, and the semantic web.  As the tides of technology ebb and flow, we hope the good ideas wash up on our beach and the bad ones drift back out to sea.</description>
	<lastBuildDate>Fri, 20 Nov 2009 13:50:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>octo.py: quick and easy MapReduce for Python</title>
		<link>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/</link>
		<comments>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/#comments</comments>
		<pubDate>Fri, 02 Jan 2009 17:34:37 +0000</pubDate>
		<dc:creator>Tim Finin</dc:creator>
				<category><![CDATA[MC2]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[distributed computing needs]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://ebiquity.umbc.edu/blogger/?p=1718</guid>
		<description><![CDATA[The amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever.  That&#8217;s why everyone is interested in easy to use frameworks like MapReduce for every-day programmers to run [...]]]></description>
			<content:encoded><![CDATA[<p>The amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever.  That&#8217;s why everyone is interested in easy to use frameworks like <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> for every-day programmers to run their data crunching in parallel.</p>
<p><a href="http://code.google.com/p/octopy/">octo.py</a> is a very simple MapReduce like system inspired by Ruby&#8217;s <a href="http://tech.rufy.com/2006/08/mapreduce-for-ruby-ridiculously-easy.html"> Starfish</a>.</p>
<blockquote><p>
&#8220;<a href="http://code.google.com/p/octopy/">Octo.py</a> doesn&#8217;t aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there&#8217;s a good chance that you can make it distributed with just a few small changes. If you&#8217;re already using Python&#8217;s map() and reduce() functions, the changes needed are trivial!&#8221;
</p></blockquote>
<p>triangular.py is the simple example given in the documentation that is used with octo.py to compute the first 100 <a href="http://wikipedia.org/wiki/Triangular_number">triangular numbers</a>.</p>
<blockquote>
<pre>
# triangular.py compute first 100 triangular numbers. Do
# 'octo.py server triangular.py' on server with address IP
# and 'octo.py client IP' on each client. Server uses source
# &#038; final, sends tasks to clients, integrates results. Clients
# get tasks from server, use mapfn &#038; reducefn, return results.

source = dict(zip(range(100), range(100)))

def final(key, value):
    print key, value

def mapfn(key, value):
    for i in range(value + 1):
        yield key, i

def reducefn(key, value):
    return sum(value)
</pre>
</blockquote>
<p>Put <a href="http://ebiquity.umbc.edu/blogger/wp-content/uploads/2009/01/octo.py">octo.py</a> on all of the machines you want to use. On the machine you will use as a server (with ip address &lt;ip&gt;), also install <a href="http://ebiquity.umbc.edu/blogger/wp-content/uploads/2009/01/triangular.py"> triangular.py</a>, and then execute:</p>
<pre>
     python octo.py server triangular.py &amp;
</pre>
<p>On each of your clients, run </p>
<pre>
     python octo.py client &lt;ip&gt; &amp;
</pre>
<p>You can try this out using the same machine to run the server process and one or more client processes, of course.</p>
<p>When the clients register with the server, they will get a copy of <em>triangular.py</em> and wait for tasks from the server.  The server access the data from <em>source</em> and distributed tasks to the clients. These in turn use <em>mapfn</em> and <em>reducefn</em> to complete the tasks, returning the results.  The server integrates these and, when all have completed, invokes <em>final</em>, which in this case just prints the answers, and halts.  The clients continue to run, waiting for more tasks to do. </p>
<p>Octo.py is not a replacement for more sophisticated frameworks like Hadoop or Disco but if you are working in Python, its <a href="http://en.wikipedia.org/wiki/KISS_principle">KISS</a> approach is a good way to get started with the MapReduce paradigm and might be all you need for a small projects.</p>
<p>(Note: The package has not been updated since April 2008, so it&#8217;s status is not clear.  But further development would run the risk of making it more complex and would be self-defeating.)</p>
]]></content:encoded>
			<wfw:commentRss>http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
