hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Is this just bluster?
Date Thu, 16 Jun 2011 15:38:02 GMT
On 16/06/11 14:49, David Scott Williams wrote:
> http://gigaom.com/cloud/lexisnexis-open-sources-its-hadoop-killer/

It's interesting that they decided they had to become open source to 
survive. It's the Linux effect: it's not that it's better than Solaris 
was, it just got the momentum up.

A strength of Hadoop is that it does have layers on it, and a lot of the 
interesting stuff is above the basic layer -Mahout, Pig, Hive, Hama, 
etc, and you can plug in things: filesystems, schedulers, HDFS placers. 
While we can debate what "compatible" means, by implementing the APIs 
that the higher layers use, MapR and hence EMC's products can run those 
higher layers. HPCC looks to be a completely new ecosystem.

Oh, and the license is AGPL, which complicates any external-facing web 
app way more than even GPL does. Good for business models (you can pay 
for the alternate license), but not ideal for takeup.

HPCC do a good comparision page here, seems quite unbiased


Regarding performance, I haven't seen any new terasort numbers for a 
while. Whoever next brings up a 1000+ node cluster should publish them.

As HPCC say: "In practice, HPCC configurations require significantly 
fewer nodes to provide the same processing performance as a Hadoop 
cluster. Sizing of clusters may depend however on the overall storage 
requirements for the distributed file system."

That means if your cluster is driven by storage demands, that fixes node 
size more than CPU issues (though if you need less CPUs, that's capex 
and opex savings or the opportunity to do other things with CPU time)

View raw message