<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tim Showers - Web Development, Design, and Data Visualization &#187; Data Mining</title>
	<atom:link href="http://www.timshowers.com/category/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.timshowers.com</link>
	<description>Tutorials, Polemics, and Discussion about all things web-nerdy</description>
	<lastBuildDate>Mon, 18 May 2009 21:30:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>O&#8217;Reilly on the Future of Massive Data Analysis</title>
		<link>http://www.timshowers.com/2008/11/future-of-data-analysis/</link>
		<comments>http://www.timshowers.com/2008/11/future-of-data-analysis/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 01:28:02 +0000</pubDate>
		<dc:creator>Tim Showers</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Data Visualization]]></category>

		<guid isPermaLink="false">http://www.timshowers.com/?p=269</guid>
		<description><![CDATA[There&#8217;s a post by Joseph Hellerstein worth a read over on O&#8217;Reilly Radar: The Commoditization of Massive Data Analysis.  It&#8217;s more enterprise focused then small-normal business focused, but that&#8217;s just a consequence of the target audience.

His primary point is becoming especially pertinent to web companies and smaller developers: The convergence of dropping hardware prices and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html"><img class="aligncenter size-full wp-image-272" title="Comparison Chart of Data Analysis Methods" src="http://www.timshowers.com/wp-content/uploads/2008/11/datacomparison.png" alt="" width="500" height="192" /></a></p>
<p>There&#8217;s a post by <a href="http://radar.oreilly.com/joeh">Joseph Hellerstein</a> worth a read over on O&#8217;Reilly Radar: <a href="http://radar.oreilly.com/2008/11/the-commoditization-of-massive.html">The Commoditization of Massive Data Analysis</a>.  It&#8217;s more enterprise focused then small-normal business focused, but that&#8217;s just a consequence of the target audience.</p>
<p><span id="more-269"></span></p>
<p>His primary point is becoming especially pertinent to web companies and smaller developers: The convergence of dropping hardware prices and machine-readable APIs is making the storage and processing of vast amounts of information practical.</p>
<blockquote><p>We are at the beginning of what I call The Industrial Revolution of Data. We&#8217;re not quite there yet, since most of the digital information available today is still individually &#8220;handmade&#8221;: prose on web pages, data entered into forms, videos and music edited and uploaded to servers. But we are starting to see the rise of automatic data generation &#8220;factories&#8221; such as software logs, UPC scanners, RFID, GPS transceivers, video and audio feeds.</p></blockquote>
<p>It&#8217;s already reasonable for a site on a commodity web host to store every user and search interaction, or a database of tens of millions of data points, and in the future it will only get easier. The question is, what tools will we use to make sense of all of this?</p>
<p>His analysis reduces the field to SQL (via <a href="http://www.oracle.com/">Oracle</a>) and MapReduce (via <a href="http://hadoop.apache.org/core/">Hadoop</a>), but once we look beyond the enterprise, tools like <a href="http://www.erlang.org/">Erlang </a>(or functional programming in general) and the emerging <a href="http://incubator.apache.org/couchdb/">CouchDB </a>show promise, not to mention some of the cloud computing entries from <a href="http://aws.amazon.com/">Amazon</a> and others.</p>
<p>On the visualization side of things, tools like <a href="http://processing.org/">Processing</a> and the <a href="http://prefuse.org/">Prefuse Toolkit</a> are seeing quick uptake, as well as more focused commercial tools like <a href="http://www.fusioncharts.com/">FusionCharts</a>.</p>
<p>Whatever the toolchain turns out to be, those of us with an interest in understanding information have the opportunity to be on the forefront of the change, and if we don&#8217;t gain expertise in the available options early, we risk being left behind.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.timshowers.com/2008/11/future-of-data-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
