hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Loading data, hbase slower than Hive?
Date Thu, 17 Jan 2013 17:00:26 GMT
In case of Hive data insertion means placing the file under table path in
HDFS.  HBase need to read the data and convert it into its format. (HFiles)
MR is doing this work..  So this makes it clear that HBase will be slower.
:)  As Michael said the read operation...


On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <austincv@gmail.com>wrote:

>   Hi,
> Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins.
> It's a 20 gb data set approx 230 million records. The data is in hdfs,
> single text file. The cluster is 11 nodes, 8 cores.
> I loaded this in hive, partitioned by date and bucketed into 32 and sorted.
> Time taken is 6 mins.
> I loaded the same data into hbase, in the same cluster by writing a map
> reduce code. It took 1hr 14 mins. The cluster wasn't running anything else
> and assuming that the code that i wrote is good enough, what is it that
> makes hbase slower than hive in loading the data?
> Thanks,
> Austin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message