hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rama Ramani <rama.ram...@live.com>
Subject HBase - bulk loading files
Date Fri, 19 Dec 2014 21:43:47 GMT
Hello,         I am bulk loading a set of files (about 400MB each) with "|" as the delimiter
using ImportTsv. It takes a long time for the 'map' job to complete on both a 4 node and a
16 node cluster. I tried the option to generate the output (providing -Dimporttsv.bulk.output)
which took time indicating that the generation of the output files needs improvement.
I am seeing about 8000 rows / sec for this dataset, the 400MB ingestion takes about 5-6 mins.
How can I improve this? Is there an alternate tool I can use?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message