Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-Id: <FA9F0709-7253-442E-8AC5-86D0865AD3BC@evri.com>
From: Alex Vollmer <alex@evri.com>
To: HBase User List <hbase-user@hadoop.apache.org>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v928.1)
Subject: Need some help diagnosing poor performance (long)
Date: Fri, 12 Sep 2008 15:34:05 -0700

We've been evaluating HBase by running various tasks on EC2 using x- 
large instances. We've been surprised at how slow some M/R jobs seem  
to run. I chatted with a few folks yesterday on the IRC channel, but  
wanted to post something here to include all of the details.

Our test case is:
- one table with ~5 million rows
- each row has two column families with LongWritable values
- the entire contents of the table reside on a single region server
- only one mapper ever runs, regardless of configuration. note that  
this is *not* running on the same box as the aforementioned region  
server (!!)
- the M/R job counts the number of cells each row has
- the total runtime for job is anywhere from 25-50 minutes, nearly all  
of it spent in the map phase (<1 for reduce)

We were expecting to be able to crank through this data faster than  
this (< 10 min). Is that a reasonable expectation? The EC2 x-large  
instances have 4 cores & 16GB RAM.

Our casual observation of the running nodes is that the machines  
running the mapper and relevant region server get some load, but are ~  
98% idle. IO wait time (checked via iostat) is negligible. Network  
traffic seems a steady 3.8Mb/s while the M/R job is running.

Below are (what I hope to be) the relevant config and source code  
files. Any help or insight would be greatly appreciated.

Job Config
----------

name	value
hbase.mapred.tablecolumns	index_count: hit_data:
dfs.client.buffer.dir	${hadoop.tmp.dir}/dfs/tmp
mapred.task.cache.levels	2
hadoop.tmp.dir	/mnt/hadoop
hadoop.native.lib	true
map.sort.class	org.apache.hadoop.mapred.MergeSorter
ipc.client.idlethreshold	4000
mapred.system.dir	${hadoop.tmp.dir}/mapred/system
hbase.regionserver.nbreservationblocks	4
mapred.job.tracker.persist.jobstatus.hours	0
dfs.namenode.logging.level	info
dfs.datanode.address	0.0.0.0:50010
io.skip.checksum.errors	false
fs.default.name	hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001
mapred.child.tmp	./tmp
dfs.safemode.threshold.pct	0.999f
dfs.namenode.handler.count	10
hbase.client.retries.number	10
mapred.tasktracker.dns.nameserver	default
io.sort.factor	10
mapred.task.timeout	600000
mapred.max.tracker.failures	4
hadoop.rpc.socket.factory.class.default	 
org.apache.hadoop.net.StandardSocketFactory
mapred.mapoutput.key.class	org.apache.hadoop.io.Text
fs.hdfs.impl	org.apache.hadoop.dfs.DistributedFileSystem
ipc.client.maxidletime	120000
dfs.safemode.extension	30000
tasktracker.http.threads	80
user.name	root
mapred.output.compress	false
io.bytes.per.checksum	512
topology.node.switch.mapping.impl	 
org.apache.hadoop.net.ScriptBasedMapping
hbase.master.meta.thread.rescanfrequency	60000
hbase.master.info.bindAddress	0.0.0.0
mapred.reduce.max.attempts	4
hbase.regionserver.handler.count	10
fs.ramfs.impl	org.apache.hadoop.fs.InMemoryFileSystem
hbase.hregion.memcache.flush.size	67108864
hbase.regionserver.class	org.apache.hadoop.hbase.ipc.HRegionInterface
hadoop.job.ugi	root,root,bin,daemon,sys,adm,disk,wheel
mapred.job.tracker.persist.jobstatus.dir	/jobtracker/jobsInfo
mapred.jar	/mnt/hadoop/mapred/local/jobTracker/job_200809121135_0009.jar
fs.s3.buffer.dir	${hadoop.tmp.dir}/s3
dfs.block.size	67108864
job.end.retry.attempts	0
fs.file.impl	org.apache.hadoop.fs.LocalFileSystem
mapred.local.dir.minspacestart	0
mapred.output.compression.type	BLOCK
dfs.permissions	true
hbase.regionserver.globalMemcacheLimitlowMark	256435456
topology.script.number.args	20
hbase.io.index.interval	32
mapred.task.profile.maps	0-2
dfs.datanode.https.address	0.0.0.0:50475
mapred.userlog.retain.hours	24
dfs.secondary.http.address	0.0.0.0:50090
dfs.replication.max	512
mapred.job.tracker.persist.jobstatus.active	false
hbase.regionserver.maxlogentries	30000
local.cache.size	10737418240
mapred.min.split.size	0
mapred.map.tasks	1
mapred.child.java.opts	-Xmx200m
dfs.https.address	0.0.0.0:50470
dfs.balance.bandwidthPerSec	1048576
ipc.server.listen.queue.size	128
hbase.master	10.254.42.47:60000
mapred.inmem.merge.threshold	1000
job.end.retry.interval	30000
hbase.master.info.port	60010
fs.checkpoint.dir	${hadoop.tmp.dir}/dfs/namesecondary
hbase.rootdir	file:///tmp/hbase-${user.name}/hbase
hbase.regionserver.info.bindAddress	0.0.0.0
mapred.reduce.tasks	1
hbase.client.pause	10000
mapred.userlog.limit.kb	0
webinterface.private.actions	false
dfs.max.objects	0
io.sort.spill.percent	0.80
mapred.job.split.file	/mnt/hadoop/mapred/system/job_200809121135_0009/ 
job.split
mapred.job.name	CellCount unknown_person
mapred.map.tasks.speculative.execution	true
dfs.datanode.dns.nameserver	default
dfs.blockreport.intervalMsec	3600000
mapred.map.max.attempts	4
hbase.hstore.blockCache.blockSize	65536
mapred.job.tracker.handler.count	10
dfs.client.block.write.retries	3
hbase.regionserver	0.0.0.0:60020
mapred.input.format.class	 
org.apache.hadoop.hbase.mapred.TableInputFormat
mapred.tasktracker.expiry.interval	600000
ipc.client.timeout	60000
keep.failed.task.files	false
https.keystore.info.rsrc	sslinfo.xml
hbase.hstore.compactionThreshold	3
ipc.client.tcpnodelay	false
mapred.task.profile.reduces	0-2
mapred.output.compression.codec	 
org.apache.hadoop.io.compress.DefaultCodec
io.map.index.skip	0
mapred.working.dir	hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 
50001/user/root
ipc.server.tcpnodelay	false
mapred.reducer.class	com.evri.zeitgeist.hadoop.mapred.CellCount 
$CellCountReduce
hbaseshell.jline.bell.enabled	true
dfs.default.chunk.view.size	32768
hadoop.logfile.size	10000000
dfs.datanode.du.pct	0.98f
mapred.reduce.tasks.speculative.execution	true
fs.inmemory.size.mb	75
dfs.datanode.du.reserved	0
fs.checkpoint.period	3600
dfs.web.ugi	webuser,webgroup
mapred.jobtracker.completeuserjobs.maximum	100
dfs.df.interval	60000
hbase.regionserver.info.port	60030
dfs.data.dir	${hadoop.tmp.dir}/dfs/data
fs.s3.maxRetries	4
dfs.datanode.dns.interface	default
mapred.mapoutput.value.class	org.apache.hadoop.io.LongWritable
hbase.regionserver.lease.period	30000
mapred.local.dir	${hadoop.tmp.dir}/mapred/local
fs.hftp.impl	org.apache.hadoop.dfs.HftpFileSystem
dfs.permissions.supergroup	supergroup
mapred.mapper.class	com.evri.zeitgeist.hadoop.mapred.CellCount 
$CellCountMap
fs.trash.interval	0
fs.s3.sleepTimeSeconds	10
dfs.replication.min	1
mapred.submit.replication	10
mapred.map.output.compression.codec	 
org.apache.hadoop.io.compress.DefaultCodec
mapred.tasktracker.dns.interface	default
dfs.namenode.decommission.interval	300
dfs.http.address	0.0.0.0:50070
hbase.master.lease.period	120000
mapred.job.tracker	domU-12-31-39-00-25-C1.compute-1.internal:50002
dfs.heartbeat.interval	3
io.seqfile.sorter.recordlimit	1000000
hbase.regionserver.thread.splitcompactcheckfrequency	20000
dfs.name.dir	${hadoop.tmp.dir}/dfs/name
dfs.datanode.http.address	0.0.0.0:50075
mapred.local.dir.minspacekill	0
dfs.replication.interval	3
io.sort.record.percent	0.05
fs.kfs.impl	org.apache.hadoop.fs.kfs.KosmosFileSystem
mapred.map.output.compression.type	RECORD
mapred.temp.dir	${hadoop.tmp.dir}/mapred/temp
mapred.tasktracker.reduce.tasks.maximum	8
dfs.replication	3
hadoop.logfile.count	10
io.seqfile.compress.blocksize	1000000
hbase.regionserver.optionalcacheflushinterval	1800000
fs.s3.block.size	67108864
hbase.regionserver.globalMemcacheLimit	536870912
fs.hsftp.impl	org.apache.hadoop.dfs.HsftpFileSystem
mapred.task.tracker.http.address	0.0.0.0:50060
hbase.hbasemaster.maxregionopen	120000
mapred.reduce.parallel.copies	5
io.seqfile.lazydecompress	true
mapred.output.dir	hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 
50001/usr/cell-count
io.sort.mb	100
ipc.client.connection.maxidletime	1000
hbase.hregion.max.filesize	268435456
mapred.compress.map.output	false
mapred.task.tracker.report.address	127.0.0.1:0
ipc.client.kill.max	10
ipc.client.connect.max.retries	10
hbase.regionserver.msginterval	3000
fs.s3.impl	org.apache.hadoop.fs.s3.S3FileSystem
mapred.input.dir	hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 
50001/user/root/unknown_person
mapred.job.tracker.http.address	0.0.0.0:50030
io.file.buffer.size	4096
io.serializations	org.apache.hadoop.io.serializer.WritableSerialization
hbase.hregion.memcache.block.multiplier	2
hbase.io.seqfile.compression.type	NONE
mapred.reduce.copy.backoff	300
mapred.task.profile	false
hbase.server.thread.wakefrequency	10000
dfs.replication.considerLoad	true
jobclient.output.filter	FAILED
mapred.tasktracker.map.tasks.maximum	8
io.compression.codecs	 
org 
.apache 
.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec
fs.checkpoint.size	67108864
hbase.regionserver.dns.interface	default
hbase.regionserver.optionallogrollinterval	1800000


The Code
--------

package com.evri.zeitgeist.hadoop.mapred;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.mapred.TableMap;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import com.evri.zeitgeist.hadoop.util.Utils;

public class CellCount extends Configured implements Tool {

     public static class CellCountMap extends TableMap<Text,  
LongWritable> {
         @Override
         public void map(ImmutableBytesWritable key, RowResult row,
                 OutputCollector<Text, LongWritable> collector,  
Reporter reporter) throws IOException {
             collector.collect(new Text(row.getRow()), new  
LongWritable(row.entrySet().size()));
         }
     }

     public static class CellCountReduce extends MapReduceBase
     implements Reducer<Text, LongWritable, Text, LongWritable> {
         @Override
         public void reduce(Text key, Iterator<LongWritable> results,
                 OutputCollector<Text, LongWritable> collector,  
Reporter reporter) throws IOException {
             long total = 0;
             while (results.hasNext()) {
                 total += results.next().get();
             }
             collector.collect(key, new LongWritable(total));
         }

     }

     @Override
     public int run(String[] args) throws Exception {
         if (args == null || args.length != 4) {
             System.err.println("ARGS: [table] [output dir] [mappers]  
[reducers]");
             System.exit(1);
         }
         JobConf conf = new JobConf(getConf(),  
IndexCountWeekReport.class);
         conf.setJobName("CellCount " + args[0]);
         conf.setNumMapTasks(Integer.parseInt(args[2]));
         conf.setNumReduceTasks(Integer.parseInt(args[3]));

         FileOutputFormat.setOutputPath(conf, new Path(args[1] + "/ 
cell-count"));
         TableMap.initJob(args[0],
                 "index_count: hit_data:",
                 CellCountMap.class,
                 Text.class,
                 LongWritable.class,
                 conf);

         conf.setReducerClass(CellCountReduce.class);

         JobClient.runJob(conf);
         return 0;
     }

     public static void main(String[] args) throws Exception {
         HBaseConfiguration conf = new HBaseConfiguration();
         conf.set("hbase.master", Utils.getMyIp() + ":60000");
         int result = ToolRunner.run(conf, new CellCount(), args);
         System.exit(result);
     }
}


hadoop-site.xml
---------------

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
   <name>hadoop.tmp.dir</name>
   <value>/mnt/hadoop</value>
</property>

<property>
   <name>fs.default.name</name>
   <value>hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001</value>
</property>

<property>
   <name>mapred.job.tracker</name>
   <value>domU-12-31-39-00-25-C1.compute-1.internal:50002</value>
</property>

<property>
   <name>tasktracker.http.threads</name>
   <value>80</value>
</property>

<property>
   <name>mapred.tasktracker.map.tasks.maximum</name>
   <value>8</value>
</property>

<property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>
   <value>8</value>
</property>

<property>
   <name>mapred.output.compress</name>
   <value>false</value>
</property>

<property>
   <name>mapred.output.compression.type</name>
   <value>BLOCK</value>
</property>

<property>
   <name>dfs.client.block.write.retries</name>
   <value>3</value>
</property>

</configuration>

hbase-site.xml
--------------

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
   <property>
     <name>hbase.master</name>
     <value>domU-12-31-39-00-25-C1.compute-1.internal:60000</value>
   </property>

   <property>
     <name>hbase.rootdir</name>
     <value>hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001/ 
hbase</value>
   </property>
</configuration>


--
Alex Vollmer
Evri -- Helping users make sense of the world's information
Senior Software Engineer
alex@evri.com