hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <l...@worldlingo.com>
Subject Re: Settings
Date Thu, 27 Aug 2009 10:11:55 GMT
Hi Schubert,

See my comments inline below.

>>  HBase:
>> ---------
>> - fs.default.name => hdfs://<master-hostname>:9000/
>>
>> This is usually in core-site.xml in Hadoop. Is the client or server needing
>> this key at all? Did I copy it in the hbase site file by mistake
> [schubert] I think it's better not to copy it into HBase conf file. I
> suggest you modify you hbase-env.sh to add the conf path of hadoop into you
> HBASE_CLASSPATH, e.g. export
> HBASE_CLASSPATH=${HBASE_HOME}/../hadoop-0.20.0/conf.
> Except for that, we also should config GC options here.
>   

I agree, but I think this is a remnant of old and can be removed. I do 
not think I had a need to add the Hadoop conf to HBase. But it may be 
necessary for example for the replication factor. I am think on lowering 
this to 2 on smaller clusters and for the DFSClient to get the default 
we need this value available.

Another option would be to symlink the hadoop site files to the 
hbase/conf, which explicitly only wires the site files. With the above 
you are also adding the second log4j.properties and metrics file etc. 
Not sure if that could have a side effect?

>> - hbase.cluster.distributed => true
>>
>> For true replication and stand alone ZK installations.
>>     
> [schubert] also should export HBASE_MANAGES_ZK=false in hbase-env.sh to make
> consistent.
>   

Agreed, I have that set, but did not mention it.

>> - dfs.datanode.socket.write.timeout => 0
>>     
> [schubert] This parameper should be for hadoop, HDFS. It should be in
> hadoop-0.20.0/conf/hdfs-site.xml. But I think it should be not useful now.
>   

See Stacks and Andrews replies.

>> - hbase.regionserver.lease.period => 600000
>>
>> Default was changed from 60 to 120 seconds. Over time I had issues and have
>> set it to 10mins. Good or bad
> [schubert] I think if you select right jvm GC options, the default 60000 is
> ok.
>   

OK.

>> - hbase.hregion.memstore.block.multiplier => 4
>>
>> This is up from the default 2. Good or bad?
>>     
> [schubert] I do not think it is necessary, do you describe you reason?
>   

I got that recommended but wanted to make sure I understand its 
implications. Stack has it described nicely.

>> - hbase.hregion.max.filesize => 536870912
>>
>> Again twice as much as the default. Opinions?
>>     
> [schubert] If you want bigger region size, I think its fine. We
> even had tried 1GB in some tests.
>   

OK.

>> - hbase.regions.nobalancing.count => 20
>>
>> This seems to be missing from the hbase-default.xml but is set to 4 in the
>> code if not specified. The above I got from Ryan to improve startup of
>> HBase. It means that while a RS is still opening up to 20 regions it can
>> start rebalance regions. Handled by the ServerManager during message
>> processing. Opinions
> [schubert] I think it make sense.
>   

OK.

>> - hbase.regions.percheckin => 20
>>
>> This is the count of regions assigned in one go. Handled in RegionmManager
>> and the default is 10. Here we tell it to assign regions in larger batches
>> to speed up the cluster start. Opinions?
>>     
> [schubert] I have no idea about it. I think the region assignment will
> occupy some CPU and memory overheads on regionserver, if there are too many
> HLog to be processed.
>   

OK.

>> - hbase.regionserver.handler.count => 30
>>
>> Up from 10 as I had often the problem that the UI was not responsive while
>> a import MR job would run. All handlers were busy doing the inserts. JD
>> mentioned it may be set to a higher default value
> [schubert] It make sense. I my small 5 nodes cluster, I set it 20.
>   

OK.

>> Hadoop:
>> ----------
>>
>> - dfs.block.size => 134217728
>>
>> Up from the default 64MB. I have done this in the past as my data size per
>> "cell" is larger than the usual few bytes. I can have a few KB up to just
>> above 1 MB per value. Still making sense?
>>     
> [schubert] I think you reason make sense.
>   

OK.

>> - dfs.namenode.handler.count => 20
>>
>> This was upped from the default 10 quite some time ago (more than a year
>> ago). So is this still required?
>>
>>     
> [schubert] I also set it 20.
>   

OK.

>> - dfs.datanode.socket.write.timeout => 0
>>
>> This is the matching entry to the above I suppose. This time for the
>> DataNode. Still required
> [schubert]  I think it is not necessary now.
>   

OK, yes, as Andrew notes.

>> - dfs.datanode.max.xcievers => 4096
>>
>> Default is 256 and often way to low. What is a good value you would use?
>> What is the drawback setting it high?
>>
>>     
> [schubert] It should make sense. I use 3072 in my small cluster.
>   

OK.

Thanks Schubert!

Lars

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message