hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sateesh Lakkarsu <lakka...@gmail.com>
Subject Re: Bulk loading disadvantages
Date Thu, 26 Jul 2012 16:47:16 GMT
> For the bulkloading process, the HBase documentation mentions that in
> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
> into its storage directory and making the data available to clients."
> But from my experience the files also remain in the original location
> from where they are "adopted". So I guess the data is actually copied
> into the HBase directory right? This means that, compared to the
> online importing, when bulk loading you essentially need twice the
> disk space on HDFS, right?

Yes, if you are generating HFiles on one cluster and loading into a
separate hbase cluster. If they are co-located, its just a hdfs mv.

Another problem is with data locality immediately after bulk loading
> through MR. I understand that the locality is obtained in time through
> compactions and splits. However you don't get this problem while
> importing online, right?
> Yes

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message