hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amr Awadallah <...@cloudera.com>
Subject Re: Weblog analysis -- Cloudbase vs Hive vs Pig?
Date Fri, 10 Jul 2009 04:56:09 GMT
see this thread:

http://markmail.org/thread/wzekarj5vpylj3qc

Also, Hive and Pig are both official Apache Hadoop projects with larger 
user/developer communities than Cloudbase (which is GPL2 license as 
opposed to Apache license).

-- amr

Saurabh Nanda wrote:
> Hi,
>
> Does anyone have any pearls of wisdom around this?
>
> I'm spending part of my work time on developing a scalable weblog analysis
> system running on a 4 to 6 node cluster (standard desktop class machines). I
> don't have much time to try and benchmark all three tools (Cloudbase, Hive,
> and Pig) and would really appreciate if someone can give me a heads-up on
> what to spend my time on. Some specifics:
>
> a) Which tool can give me the best performance for this problem? (eg. best
> use of indexes, data partitioning, etc.)
> b) Which tool has the most efficient data storage so that I can store more
> days' worth of data into the cluster for ad-hoc analysis.
> c) Which tool is more mature and will not crash (for example, the disclaimer
> on the Hive Wiki really scared me --
> http://wiki.apache.org/hadoop/Hive/GettingStarted)
>
> Thanks,
> Saurabh.
>   

Mime
View raw message