hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lekhnath <lbhu...@veriskhealth.com>
Subject Re: Issue with StoreFiles with bulk import.
Date Fri, 13 Aug 2010 06:01:34 GMT
  On 8/13/2010 1:58 AM, Jeremy Carroll wrote:
> I'm currently importing some files into HBase and am running into an problem with a large
number of store files being created. We have some back data which is stored in very large
sequence files (3-5 Gb in size). When we import this data the amount of stores created does
not get out of hand. When we switch to smaller sequence files being imported we see that the
number of stores rises quite dramatically. I do not know if this is happening because we are
flushing the commits more frequently with smaller files. I'm wondering if anybody has any
advice regarding this issue. My main concern is during this process we do not finish flushing
to disk (And we set WritetoWal False). We always hit the 90 second timeout due to heavy write
load. As these store files pile up, and they do not get committed to disk, we run into issues
where we could lose a lot of data if something were to crash.
>
> I have created screen shots of or monitoring application for HBase which shows the spikes
in activity.
>
> http://twitpic.com/photos/jeremy_carroll
>
>
>

We faced the similar problem while doing bulk imports. For large number 
of reducers, we got large number of small files. Most probably, each 
reducer creates one file at the list. Making appropriate number of 
reducers and input file size solved the issue.

Lekhnath



This email is intended for the recipient only. If you are not the intended
recipient please disregard, and do not use the information for any purpose.

Mime
View raw message