hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: Weblog analysis -- Cloudbase vs Hive vs Pig?
Date Fri, 10 Jul 2009 19:06:30 GMT
Also see some benchmarks run by the Hive team at
https://issues.apache.org/jira/browse/HIVE-396.

On Thu, Jul 9, 2009 at 9:56 PM, Amr Awadallah <aaa@cloudera.com> wrote:

> see this thread:
>
> http://markmail.org/thread/wzekarj5vpylj3qc
>
> Also, Hive and Pig are both official Apache Hadoop projects with larger
> user/developer communities than Cloudbase (which is GPL2 license as opposed
> to Apache license).
>
> -- amr
>
>
> Saurabh Nanda wrote:
>
>> Hi,
>>
>> Does anyone have any pearls of wisdom around this?
>>
>> I'm spending part of my work time on developing a scalable weblog analysis
>> system running on a 4 to 6 node cluster (standard desktop class machines).
>> I
>> don't have much time to try and benchmark all three tools (Cloudbase,
>> Hive,
>> and Pig) and would really appreciate if someone can give me a heads-up on
>> what to spend my time on. Some specifics:
>>
>> a) Which tool can give me the best performance for this problem? (eg. best
>> use of indexes, data partitioning, etc.)
>> b) Which tool has the most efficient data storage so that I can store more
>> days' worth of data into the cluster for ad-hoc analysis.
>> c) Which tool is more mature and will not crash (for example, the
>> disclaimer
>> on the Hive Wiki really scared me --
>> http://wiki.apache.org/hadoop/Hive/GettingStarted)
>>
>> Thanks,
>> Saurabh.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message