hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Ming (Ming)" <ming....@esgyn.cn>
Subject what is a good way to bulkload large amount of data into HBase table
Date Sat, 06 Feb 2016 08:29:55 GMT

I am trying to find a good way to import large amount of data into HBase from HDFS. I have
a csv file about 135G originally, I put it into HDFS, then I use HBase's importtsv utility
to do a bulkload, for that 135G original data, it took 40 mins. I have 10 nodes, each has
128G, and all disk is SSD, 10G network. So this speed is not very good from my humble opinion,
since It took only 10 mins for me to put that 135G data into HDFS. I assume Hive will be much
faster , for external table, it even takes no time to load. I will test it later.
So I want to ask for help if anyone has some better ideas to do bulkload in HBase? or importtsv
is already the best tool to do bulkload in HBase world?
If I have real big-data (Say > 50T), this seems not a practical loading speed, isn't it?
Or it is ? In practice, how people load data into HBase normally?

Thanks in advance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message