hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nik Lam <nik.eb....@gmail.com>
Subject Is it safe to run multiple completebulkload jobs against the same table in parallel?
Date Wed, 24 Sep 2014 00:46:37 GMT

I have a handful of large HFiles that each span a many regions on a table.

Splitting them to match the live regions is taking a very long time because
completebulkload seems to work serially through the HFiles and my regions
are undergoing splits relatively often due to organic growth - meaning the
region boundaries change while the completbulkload is in flight.

I'm wondering whether it's possible to speed up the overall bulk load of
these data by running one completebulkload job for each large HFile. I.e.
running several completebulkload jobs in parallel.

Has anyone tried this before or can anyone who is familiar with the way
completebulkload works comment on such an approach?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message