hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: what is a good way to bulkload large amount of data into HBase table
Date Sat, 06 Feb 2016 13:46:58 GMT
Can you describe how you used importtsv ?
Here is one related command line parameter:

      "By default importtsv will load data directly into HBase. To instead
generate\n" +

      "HFiles of data to prepare for a bulk data load, pass the option:\n" +

      "  -D" + BULK_OUTPUT_CONF_KEY + "=/path/for/output\n" +

      "  Note: if you do not use this option, then the target table must
already exist in HBase\n" +

See also http://hbase.apache.org/book.html#arch.bulk.load.complete

FYI

On Sat, Feb 6, 2016 at 12:29 AM, Liu, Ming (Ming) <ming.liu@esgyn.cn> wrote:

> Hello,
>
> I am trying to find a good way to import large amount of data into HBase
> from HDFS. I have a csv file about 135G originally, I put it into HDFS,
> then I use HBase's importtsv utility to do a bulkload, for that 135G
> original data, it took 40 mins. I have 10 nodes, each has 128G, and all
> disk is SSD, 10G network. So this speed is not very good from my humble
> opinion, since It took only 10 mins for me to put that 135G data into HDFS.
> I assume Hive will be much faster , for external table, it even takes no
> time to load. I will test it later.
> So I want to ask for help if anyone has some better ideas to do bulkload
> in HBase? or importtsv is already the best tool to do bulkload in HBase
> world?
> If I have real big-data (Say > 50T), this seems not a practical loading
> speed, isn't it? Or it is ? In practice, how people load data into HBase
> normally?
>
> Thanks in advance,
> Ming
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message