hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: HBase scan performance decreases over time.
Date Mon, 05 Nov 2012 18:49:54 GMT
hdfs-site.xml

Its an HDFS setting that may impact the balancing of HBase as well. 
(I'm sure someone can give a better response by looking at the code. ) 


On Nov 5, 2012, at 12:14 PM, Asaf Mesika <asaf.mesika@gmail.com> wrote:

> Where is this settings located?
> 
> Sent from my iPhone
> 
> On 5 בנוב 2012, at 15:05, Michael Segel <michael_segel@hotmail.com> wrote:
> 
>> There's an HDFS bandwidth setting which is set to 10MB/s.
>> 
>> Way too low for even 1GBe.
>> 
>> Have you modified this setting yet?
>> 
>> -Mike
>> 
>> On Nov 3, 2012, at 2:50 PM, David Koch <ogdude@googlemail.com> wrote:
>> 
>>> Hello Ted,
>>> 
>>> We never initiate major compaction manually. I have not looked at I/O
>>> balance between nodes in detail. We have noticed that after running for a
>>> couple of weeks HBase seems to spend hours pushing blocks between nodes in
>>> order to optimize things. We add data daily in one ~30gb push to several
>>> tables. Sometimes nodes get added to the running system.
>>> 
>>> Where can I get more information on how to carry out performance related
>>> HBase administrative tasks?
>>> 
>>> Thank you,
>>> 
>>> /David
>>> 
>>> 
>>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> 
>>>> Can you tell us how often you run major compaction after the import ?
>>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>>>> subset of region servers receive bulk of the writes.
>>>> 
>>>> We do some manual movement of regions when the above happens.
>>>> 
>>>> Cheers
>>>> 
>>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <ogdude@googlemail.com>
wrote:
>>>> 
>>>>> Hello,
>>>>> 
>>>>> Every now and then we need to flatten our cluster and re-import all data
>>>>> from log files (changes in data format, etc.) Afterwards we notice a
>>>>> significant increase in scan performance. As data is added and shuffled
>>>>> around between region servers, performance goes down again over time
>>>> (say a
>>>>> couple of weeks). Are there any routine operations that one should run
>>>>> manually, or settings to activate in the HBase configuration to keep
the
>>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>>>> 
>>>>> Thank you,
>>>>> 
>>>>> /David
>> 
> 


Mime
View raw message