hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sever Fundatureanu <fundatureanu.se...@gmail.com>
Subject Re: Bulk loading disadvantages
Date Fri, 27 Jul 2012 11:17:43 GMT
Hi Anil,

I am using HBase 0.94.0 with Hadoop 1.0.0. The directories are indeed
the ones mentioned my Bijeet. I can also add that I am doing the 2nd
stage programatically by calling doBulkLoad(org.apache.hadoop.fs.Path
sourceDir, HTable table) on a LoadIncrementalHFiles object.

Best,
Sever


On Fri, Jul 27, 2012 at 5:40 AM, Anil Gupta <anilgupta84@gmail.com> wrote:
> Hi Sever,
>
> That's a very interesting thing. Which Hadoop and hbase version you are using? I am going
to run bulk loads tomorrow. If you can tell me which directories in hdfs you compared with
/hbase/$table then I will try to check the same.
>
> Best Regards,
> Anil
>
> On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu <fundatureanu.sever@gmail.com>
wrote:
>
>> On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <lakkarsu@gmail.com> wrote:
>>>>
>>>>
>>>> For the bulkloading process, the HBase documentation mentions that in
>>>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
>>>> into its storage directory and making the data available to clients."
>>>> But from my experience the files also remain in the original location
>>>> from where they are "adopted". So I guess the data is actually copied
>>>> into the HBase directory right? This means that, compared to the
>>>> online importing, when bulk loading you essentially need twice the
>>>> disk space on HDFS, right?
>>>>
>>>
>>> Yes, if you are generating HFiles on one cluster and loading into a
>>> separate hbase cluster. If they are co-located, its just a hdfs mv.
>>
>> Hmm, both the HFile generation and the HBase cluster runs on top of
>> the same HDFS cluster. I did a "du" on both the source HDFS directory
>> and the destination "/hbase" directory and I got the same sizes (+-
>> few bytes). I deleted the source directory from HDFS and then scanned
>> the table without any problems. Maybe there is a config parameter I'm
>> missing?
>>
>> Sever



-- 
Sever Fundatureanu

Vrije Universiteit Amsterdam
E-mail: fundatureanu.sever@gmail.com

Mime
View raw message