hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (POWERSET)" <Jim.Keller...@microsoft.com>
Subject RE: Settings
Date Thu, 27 Aug 2009 01:21:24 GMT
Lars,

Good stuff. Want to add it to the wiki?

-----Original Message-----
From: Lars George [mailto:lars@worldlingo.com] 
Sent: Wednesday, August 26, 2009 7:40 AM
To: hbase-user@hadoop.apache.org
Subject: Settings

Hi,

It seems over the years I tried various settings in both Hadoop and HBase and when redoing
a cluster it is always a question if we should keep that setting or not - since the issue
it "suppressed" was fixed already. Maybe we should have a wiki page with the current settings
and more advanced ones and when and how to use them. I find often that the description itself
in the various default files are often as ambiguous as the setting key itself.

Here a list of the not so obvious settings and what I set them as - please help me identifying
which are useful or actually obsolete.

HBase:
---------

- fs.default.name => hdfs://<master-hostname>:9000/

This is usually in core-site.xml in Hadoop. Is the client or server needing this key at all?
Did I copy it in the hbase site file by mistake?

- hbase.cluster.distributed => true

For true replication and stand alone ZK installations.

- dfs.datanode.socket.write.timeout => 0

This is used in DataNode but here more importantly in DFSClient. Its default is fixed to apparently
8 minutes, no default file (I would have assumed hdfs-default.xml) has it listed.

We set it to 0 to avoid the socket timing out on low use etc. because the DFSClient reconnect
is not handled gracefully. I trust setting it to 0 is what we recommend for HBase and is still
valid?

- hbase.regionserver.lease.period => 600000

Default was changed from 60 to 120 seconds. Over time I had issues and have set it to 10mins.
Good or bad?

- hbase.hregion.memstore.block.multiplier => 4

This is up from the default 2. Good or bad?

- hbase.hregion.max.filesize => 536870912

Again twice as much as the default. Opinions?

- hbase.regions.nobalancing.count => 20

This seems to be missing from the hbase-default.xml but is set to 4 in the code if not specified.
The above I got from Ryan to improve startup of HBase. It means that while a RS is still opening
up to 20 regions it can start rebalance regions. Handled by the ServerManager during message
processing. Opinions?

- hbase.regions.percheckin => 20

This is the count of regions assigned in one go. Handled in RegionmManager and the default
is 10. Here we tell it to assign regions in larger batches to speed up the cluster start.
Opinions?

- hbase.regionserver.handler.count => 30

Up from 10 as I had often the problem that the UI was not responsive while a import MR job
would run. All handlers were busy doing the inserts. JD mentioned it may be set to a higher
default value?


Hadoop:
----------

- dfs.block.size => 134217728

Up from the default 64MB. I have done this in the past as my data size per "cell" is larger
than the usual few bytes. I can have a few KB up to just above 1 MB per value. Still making
sense?

- dfs.namenode.handler.count => 20

This was upped from the default 10 quite some time ago (more than a year ago). So is this
still required?

- dfs.datanode.socket.write.timeout => 0

This is the matching entry to the above I suppose. This time for the DataNode. Still required?

- dfs.datanode.max.xcievers => 4096

Default is 256 and often way to low. What is a good value you would use?
What is the drawback setting it high?


Thanks,
Lars
Mime
View raw message