hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: Put errors via thrift
Date Tue, 15 Feb 2011 22:48:09 GMT
Thanks for the help. It definitely looks like the move to 0.90 would resolve many of these
issues.

-chris

On Feb 15, 2011, at 2:33 PM, Jean-Daniel Cryans wrote:

> That would make sense... although I've done testing and the more files
> you have to split, the longer it takes to create the reference files
> so the longer the split. Now that I think of it, with your high
> blocking store files setting, you may be running into an extreme case
> of https://issues.apache.org/jira/browse/HBASE-3308
> 
> J-D
> 
> On Tue, Feb 15, 2011 at 2:27 PM, Chris Tarnas <cft@email.com> wrote:
>> No swapping, about 30% of the total CPU is idle, looking through ganglia I do see
a spike in cpu_wio at that time - but only to 2%. My suspect though is GZ compression is just
taking a while.
>> 
>> 
>> 
>> On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:
>> 
>>> Yeah if it's the same key space that splits, it could explain the
>>> issue... 65 seconds is a long time! Is there any swapping going on?
>>> CPU or IO starvation?
>>> 
>>> In that context I don't see any problem setting the pausing time higher.
>>> 
>>> J-D
>>> 
>>> On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <cft@email.com> wrote:
>>>> Hi JD,
>>>> 
>>>> Two splits happened within 90 seconds of each other on one server - one took
65 seconds, the next took 43 seconds. with only a 10 second timeout (10 tries, 1 second between)
I think that was the issue. Are their any hidden issues to raising those retry parameters
so I can withstand a 120 second pause?
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>>>> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>>>> 
>>>>> 
>>>>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>>>> 
>>>>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <cft@email.com>
wrote:
>>>>>>> We are definitely considering writing a bulk loader, but as it
is this fits into an existing processing pipeline that is not Java and does not fit into the
importtsv tool (we use column names as data as well) we have not done it yet. I do foresee
a Java bulk loader in our future though.
>>>>>> 
>>>>>> Well I was referring to THE bulk loader: http://hbase.apache.org/bulk-loads.html
>>>>>> 
>>>>> 
>>>>> It has the same problem really for us. Also - does that needs 0.92 for
multi-column support? I'm pretty sure we will be moving to a bulk loader soon.
>>>>> 
>>>>>>> 
>>>>>>> Does the shell expose the createTable method that defines the
number of columns (or I suppose I'll probably need to brush up on my JRuby...). Splits were
definitely happening then. Currently I'm using 1GB regions, I'll probably go larger (~5) and
salt my keys to distribute them better.
>>>>>> 
>>>>>> I don't think that method is in the shell, it'd be weird anyway to
>>>>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>>>>> hotspots? If so, definitely solve the key distribution as it's going
>>>>>> to kill your performance. Bigger regions won't really help if you're
>>>>>> still always writing to the same few ones.
>>>>>> 
>>>>> 
>>>>> We use schema files that we redirect into the shell like DDL. My other
reason to go to large reasons was we are going to have lots of older data as well. The top
few loads will be hot and used most often but we do need access to the older data as well.
I foresee up to about 2-4 billion rows a week, so at the rate we are creating these tables
that would be quite a few regions per server at 1GB regions.
>>>>> 
>>>>>>> 
>>>>>>> The reason I had thought it might be compaction related is I
saw that we had hit the hbase.hstore.blockingStoreFiles limit as well as having the timeout
expire.
>>>>>>> 
>>>>>> 
>>>>>> Well the writes would block on flushing, so unless all the handlers
>>>>>> are filled then you shouldn't see retries exhausted. You could grep
>>>>>> your logs to see how log the splits took btw, but the total locking
>>>>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>>>>> definitely help here.
>>>>>> 
>>>>> 
>>>>> Most splits look to be about 5-7 seconds. I'll investigate more around
the error times and see if any were longer.
>>>>> 
>>>>> We'll be upgrading next week.
>>>>> 
>>>>> Thanks again!
>>>>> -chris
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
View raw message