hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <l...@worldlingo.com>
Subject Settings
Date Wed, 26 Aug 2009 14:40:26 GMT
Hi,

It seems over the years I tried various settings in both Hadoop and 
HBase and when redoing a cluster it is always a question if we should 
keep that setting or not - since the issue it "suppressed" was fixed 
already. Maybe we should have a wiki page with the current settings and 
more advanced ones and when and how to use them. I find often that the 
description itself in the various default files are often as ambiguous 
as the setting key itself.

Here a list of the not so obvious settings and what I set them as - 
please help me identifying which are useful or actually obsolete.

HBase:
---------

- fs.default.name => hdfs://<master-hostname>:9000/

This is usually in core-site.xml in Hadoop. Is the client or server 
needing this key at all? Did I copy it in the hbase site file by mistake?

- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0

This is used in DataNode but here more importantly in DFSClient. Its 
default is fixed to apparently 8 minutes, no default file (I would have 
assumed hdfs-default.xml) has it listed.

We set it to 0 to avoid the socket timing out on low use etc. because 
the DFSClient reconnect is not handled gracefully. I trust setting it to 
0 is what we recommend for HBase and is still valid?

- hbase.regionserver.lease.period => 600000

Default was changed from 60 to 120 seconds. Over time I had issues and 
have set it to 10mins. Good or bad?

- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?

- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?

- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in 
the code if not specified. The above I got from Ryan to improve startup 
of HBase. It means that while a RS is still opening up to 20 regions it 
can start rebalance regions. Handled by the ServerManager during message 
processing. Opinions?

- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in 
RegionmManager and the default is 10. Here we tell it to assign regions 
in larger batches to speed up the cluster start. Opinions?

- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsive 
while a import MR job would run. All handlers were busy doing the 
inserts. JD mentioned it may be set to a higher default value?


Hadoop:
----------

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data size 
per "cell" is larger than the usual few bytes. I can have a few KB up to 
just above 1 MB per value. Still making sense?

- dfs.namenode.handler.count => 20

This was upped from the default 10 quite some time ago (more than a year 
ago). So is this still required?

- dfs.datanode.socket.write.timeout => 0

This is the matching entry to the above I suppose. This time for the 
DataNode. Still required?

- dfs.datanode.max.xcievers => 4096

Default is 256 and often way to low. What is a good value you would use? 
What is the drawback setting it high?


Thanks,
Lars


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message