hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Schubert Zhang <zson...@gmail.com>
Subject Re: Settings
Date Thu, 27 Aug 2009 03:46:39 GMT
>
>  HBase:
> ---------
> - fs.default.name => hdfs://<master-hostname>:9000/
>
> This is usually in core-site.xml in Hadoop. Is the client or server needing
> this key at all? Did I copy it in the hbase site file by mistake?
>

 [schubert] I think it's better not to copy it into HBase conf file. I
suggest you modify you hbase-env.sh to add the conf path of hadoop into you
HBASE_CLASSPATH, e.g. export
HBASE_CLASSPATH=${HBASE_HOME}/../hadoop-0.20.0/conf.
Except for that, we also should config GC options here.


>
> - hbase.cluster.distributed => true
>
> For true replication and stand alone ZK installations.
>

[schubert] also should export HBASE_MANAGES_ZK=false in hbase-env.sh to make
consistent.


>
>
> - dfs.datanode.socket.write.timeout => 0
>

[schubert] This parameper should be for hadoop, HDFS. It should be in
hadoop-0.20.0/conf/hdfs-site.xml. But I think it should be not useful now.


>
>
> This is used in DataNode but here more importantly in DFSClient. Its
> default is fixed to apparently 8 minutes, no default file (I would have
> assumed hdfs-default.xml) has it listed.
>
> We set it to 0 to avoid the socket timing out on low use etc. because the
> DFSClient reconnect is not handled gracefully. I trust setting it to 0 is
> what we recommend for HBase and is still valid?
>
> - hbase.regionserver.lease.period => 600000
>
> Default was changed from 60 to 120 seconds. Over time I had issues and have
> set it to 10mins. Good or bad?
>

[schubert] I think if you select right jvm GC options, the default 60000 is
ok.


>
>
> - hbase.hregion.memstore.block.multiplier => 4
>
> This is up from the default 2. Good or bad?
>

[schubert] I do not think it is necessary, do you describe you reason?


>
>
> - hbase.hregion.max.filesize => 536870912
>
> Again twice as much as the default. Opinions?
>

[schubert] If you want bigger region size, I think its fine. We
even had tried 1GB in some tests.


>
>
> - hbase.regions.nobalancing.count => 20
>
> This seems to be missing from the hbase-default.xml but is set to 4 in the
> code if not specified. The above I got from Ryan to improve startup of
> HBase. It means that while a RS is still opening up to 20 regions it can
> start rebalance regions. Handled by the ServerManager during message
> processing. Opinions?
>

[schubert] I think it make sense.


>
>
> - hbase.regions.percheckin => 20
>
> This is the count of regions assigned in one go. Handled in RegionmManager
> and the default is 10. Here we tell it to assign regions in larger batches
> to speed up the cluster start. Opinions?
>

 [schubert] I have no idea about it. I think the region assignment will
occupy some CPU and memory overheads on regionserver, if there are too many
HLog to be processed.


>
>
> - hbase.regionserver.handler.count => 30
>
> Up from 10 as I had often the problem that the UI was not responsive while
> a import MR job would run. All handlers were busy doing the inserts. JD
> mentioned it may be set to a higher default value?
>

[schubert] It make sense. I my small 5 nodes cluster, I set it 20.


>
> Hadoop:
> ----------
>
> - dfs.block.size => 134217728
>
> Up from the default 64MB. I have done this in the past as my data size per
> "cell" is larger than the usual few bytes. I can have a few KB up to just
> above 1 MB per value. Still making sense?
>


[schubert] I think you reason make sense.


>
> - dfs.namenode.handler.count => 20
>
> This was upped from the default 10 quite some time ago (more than a year
> ago). So is this still required?
>

[schubert] I also set it 20.


>
>
> - dfs.datanode.socket.write.timeout => 0
>
> This is the matching entry to the above I suppose. This time for the
> DataNode. Still required?
>

[schubert]  I think it is not necessary now.


>
>
> - dfs.datanode.max.xcievers => 4096
>
> Default is 256 and often way to low. What is a good value you would use?
> What is the drawback setting it high?
>

[schubert] It should make sense. I use 3072 in my small cluster.


>
>

>
> Thanks,
> Lars
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message