hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Nanda <saurabhna...@gmail.com>
Subject Weblog analysis -- Cloudbase vs Hive vs Pig?
Date Fri, 10 Jul 2009 04:45:31 GMT

Does anyone have any pearls of wisdom around this?

I'm spending part of my work time on developing a scalable weblog analysis
system running on a 4 to 6 node cluster (standard desktop class machines). I
don't have much time to try and benchmark all three tools (Cloudbase, Hive,
and Pig) and would really appreciate if someone can give me a heads-up on
what to spend my time on. Some specifics:

a) Which tool can give me the best performance for this problem? (eg. best
use of indexes, data partitioning, etc.)
b) Which tool has the most efficient data storage so that I can store more
days' worth of data into the cluster for ad-hoc analysis.
c) Which tool is more mature and will not crash (for example, the disclaimer
on the Hive Wiki really scared me --


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message