hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sever Fundatureanu <fundatureanu.se...@gmail.com>
Subject Bulk loading disadvantages
Date Thu, 26 Jul 2012 16:39:09 GMT

For the bulkloading process, the HBase documentation mentions that in
a 2nd stage "the appropriate Region Server adopts the HFile, moving it
into its storage directory and making the data available to clients."
But from my experience the files also remain in the original location
from where they are "adopted". So I guess the data is actually copied
into the HBase directory right? This means that, compared to the
online importing, when bulk loading you essentially need twice the
disk space on HDFS, right?
Another problem is with data locality immediately after bulk loading
through MR. I understand that the locality is obtained in time through
compactions and splits. However you don't get this problem while
importing online, right?

Thanks in advance,

View raw message