hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Issue with StoreFiles with bulk import.
Date Fri, 13 Aug 2010 15:13:02 GMT
On Fri, Aug 13, 2010 at 6:26 AM, Jeremy Carroll
<jeremy.carroll@networkedinsights.com> wrote:
> The main issue is that the stores get very fragmented. I've seen as much as 300 storeFiles
for one region during this process.


How many column families and doesn't this number start to do down
after a while because they get compacted together?  How many regions
per server do you have?



 I'm concerned about performance with that many files to search
through. We are seeing this in the UI for a regions list
(storeFiles=?), also we see this through our JMX monitoring as well.
If there is no issue with performance I'm OK. But as it seems right
now the only way I can decrease the storeFile count is do to a major
compaction.


You are doing manual intervention when you do that... hbase should be
compacting in the background bring down the overall numbers of
storefiles per region.


We only do one major compaction per day to minimize load on the
system. So during the day am I going to see drastically reduced
performance with 30,000 storeFiles versus 800 when it's major
compacted? The 90 second timeout was the flush wait timeout. We are
inserting pretty fast so the flush timeout hits it's 90 second limit
and aborts.



Have you changed other configurations that disable the natural hbase
compacting facility?

St.Ack

> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Friday, August 13, 2010 12:33 AM
> To: user@hbase.apache.org
> Subject: Re: Issue with StoreFiles with bulk import.
>
> On Thu, Aug 12, 2010 at 1:13 PM, Jeremy Carroll
> <jeremy.carroll@networkedinsights.com> wrote:
>> I'm currently importing some files into HBase and am running into an problem with
a large number of store files being created.
>
> Where you see this Jeremy?  In the UI?  What kinda numbers are you seeing?
>
>
>> We have some back data which is stored in very large sequence files (3-5 Gb in size).
When we import this data the amount of stores created does not get out of hand.
>
>
> So when you mapreduce using these big files as source and insert into
> hbase, its not an issue?
>
>
>> When we switch to smaller sequence files being imported we see that the number of
stores rises quite dramatically.
>
>
> Why you need to change?
>
>
>> I do not know if this is happening because we are flushing the commits more frequently
with smaller files.
>
> Probably.  Have you tinkered with hbase default settings in any way?
>
> Perhaps you are getting better parallelism when lots of small files to
> chomp on?  More concurrent maps/clients?  So rate of upload goes up?
>
>
>> I'm wondering if anybody has any advice regarding this issue. My main concern is
during this process we do not finish flushing to disk (And we set WritetoWal False). We always
hit the 90 second timeout due to heavy write load. As these store files pile up, and they
do not get committed to disk, we run into issues where we could lose a lot of data if something
were to crash.
>>
>
>
> The 90 second timeout is the regionserver timing out against
> zookeeper?  Or is it something else?
>
> Storefiles are on the filesystem so what do you mean by the above fear
> of their not being committed to disk?
>
>
>> I have created screen shots of or monitoring application for HBase which shows the
spikes in activity.
>>
>> http://twitpic.com/photos/jeremy_carroll
>>
>
>
> Nice pictures.
>
> 30k storefiles is a good number.  They will go up as you are doing a
> bulk load as the compactor is probably overrun.   HBase will usually
> catch up though especially after the upload completes.
>
> Do you have compression enabled?
>
> I see regions growing steadily rather than spiking as the comment on
> the graph says.  500 regions ain't too many...
>
> How many servers in your cluster?
>
> St.Ack
>
>
>>
>>
>

Mime
View raw message