Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 88269 invoked from network); 12 Sep 2008 22:34:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Sep 2008 22:34:39 -0000 Received: (qmail 49321 invoked by uid 500); 12 Sep 2008 22:34:35 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 49303 invoked by uid 500); 12 Sep 2008 22:34:35 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 49291 invoked by uid 99); 12 Sep 2008 22:34:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Sep 2008 15:34:35 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.17.40.91] (HELO zimbra.evri.com) (69.17.40.91) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Sep 2008 22:33:37 +0000 Received: from localhost (localhost [127.0.0.1]) by zimbra.evri.com (Postfix) with ESMTP id 87ABA1A2C00C for ; Fri, 12 Sep 2008 15:34:07 -0700 (PDT) X-Virus-Scanned: amavisd-new at X-Spam-Score: -2.432 X-Spam-Level: Received: from zimbra.evri.com ([127.0.0.1]) by localhost (zimbra.evri.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rbOeuhOz+Z1X for ; Fri, 12 Sep 2008 15:34:06 -0700 (PDT) Received: from [10.74.0.154] (unknown [10.75.0.2]) by zimbra.evri.com (Postfix) with ESMTP id B891F1A2C00B for ; Fri, 12 Sep 2008 15:34:06 -0700 (PDT) Message-Id: From: Alex Vollmer To: HBase User List Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v928.1) Subject: Need some help diagnosing poor performance (long) Date: Fri, 12 Sep 2008 15:34:05 -0700 X-Mailer: Apple Mail (2.928.1) X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-2.432 tagged_above=-10 required=5 tests=[AWL=0.067, BAYES_00=-2.599, RDNS_NONE=0.1] We've been evaluating HBase by running various tasks on EC2 using x- large instances. We've been surprised at how slow some M/R jobs seem to run. I chatted with a few folks yesterday on the IRC channel, but wanted to post something here to include all of the details. Our test case is: - one table with ~5 million rows - each row has two column families with LongWritable values - the entire contents of the table reside on a single region server - only one mapper ever runs, regardless of configuration. note that this is *not* running on the same box as the aforementioned region server (!!) - the M/R job counts the number of cells each row has - the total runtime for job is anywhere from 25-50 minutes, nearly all of it spent in the map phase (<1 for reduce) We were expecting to be able to crank through this data faster than this (< 10 min). Is that a reasonable expectation? The EC2 x-large instances have 4 cores & 16GB RAM. Our casual observation of the running nodes is that the machines running the mapper and relevant region server get some load, but are ~ 98% idle. IO wait time (checked via iostat) is negligible. Network traffic seems a steady 3.8Mb/s while the M/R job is running. Below are (what I hope to be) the relevant config and source code files. Any help or insight would be greatly appreciated. Job Config ---------- name value hbase.mapred.tablecolumns index_count: hit_data: dfs.client.buffer.dir ${hadoop.tmp.dir}/dfs/tmp mapred.task.cache.levels 2 hadoop.tmp.dir /mnt/hadoop hadoop.native.lib true map.sort.class org.apache.hadoop.mapred.MergeSorter ipc.client.idlethreshold 4000 mapred.system.dir ${hadoop.tmp.dir}/mapred/system hbase.regionserver.nbreservationblocks 4 mapred.job.tracker.persist.jobstatus.hours 0 dfs.namenode.logging.level info dfs.datanode.address 0.0.0.0:50010 io.skip.checksum.errors false fs.default.name hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001 mapred.child.tmp ./tmp dfs.safemode.threshold.pct 0.999f dfs.namenode.handler.count 10 hbase.client.retries.number 10 mapred.tasktracker.dns.nameserver default io.sort.factor 10 mapred.task.timeout 600000 mapred.max.tracker.failures 4 hadoop.rpc.socket.factory.class.default org.apache.hadoop.net.StandardSocketFactory mapred.mapoutput.key.class org.apache.hadoop.io.Text fs.hdfs.impl org.apache.hadoop.dfs.DistributedFileSystem ipc.client.maxidletime 120000 dfs.safemode.extension 30000 tasktracker.http.threads 80 user.name root mapred.output.compress false io.bytes.per.checksum 512 topology.node.switch.mapping.impl org.apache.hadoop.net.ScriptBasedMapping hbase.master.meta.thread.rescanfrequency 60000 hbase.master.info.bindAddress 0.0.0.0 mapred.reduce.max.attempts 4 hbase.regionserver.handler.count 10 fs.ramfs.impl org.apache.hadoop.fs.InMemoryFileSystem hbase.hregion.memcache.flush.size 67108864 hbase.regionserver.class org.apache.hadoop.hbase.ipc.HRegionInterface hadoop.job.ugi root,root,bin,daemon,sys,adm,disk,wheel mapred.job.tracker.persist.jobstatus.dir /jobtracker/jobsInfo mapred.jar /mnt/hadoop/mapred/local/jobTracker/job_200809121135_0009.jar fs.s3.buffer.dir ${hadoop.tmp.dir}/s3 dfs.block.size 67108864 job.end.retry.attempts 0 fs.file.impl org.apache.hadoop.fs.LocalFileSystem mapred.local.dir.minspacestart 0 mapred.output.compression.type BLOCK dfs.permissions true hbase.regionserver.globalMemcacheLimitlowMark 256435456 topology.script.number.args 20 hbase.io.index.interval 32 mapred.task.profile.maps 0-2 dfs.datanode.https.address 0.0.0.0:50475 mapred.userlog.retain.hours 24 dfs.secondary.http.address 0.0.0.0:50090 dfs.replication.max 512 mapred.job.tracker.persist.jobstatus.active false hbase.regionserver.maxlogentries 30000 local.cache.size 10737418240 mapred.min.split.size 0 mapred.map.tasks 1 mapred.child.java.opts -Xmx200m dfs.https.address 0.0.0.0:50470 dfs.balance.bandwidthPerSec 1048576 ipc.server.listen.queue.size 128 hbase.master 10.254.42.47:60000 mapred.inmem.merge.threshold 1000 job.end.retry.interval 30000 hbase.master.info.port 60010 fs.checkpoint.dir ${hadoop.tmp.dir}/dfs/namesecondary hbase.rootdir file:///tmp/hbase-${user.name}/hbase hbase.regionserver.info.bindAddress 0.0.0.0 mapred.reduce.tasks 1 hbase.client.pause 10000 mapred.userlog.limit.kb 0 webinterface.private.actions false dfs.max.objects 0 io.sort.spill.percent 0.80 mapred.job.split.file /mnt/hadoop/mapred/system/job_200809121135_0009/ job.split mapred.job.name CellCount unknown_person mapred.map.tasks.speculative.execution true dfs.datanode.dns.nameserver default dfs.blockreport.intervalMsec 3600000 mapred.map.max.attempts 4 hbase.hstore.blockCache.blockSize 65536 mapred.job.tracker.handler.count 10 dfs.client.block.write.retries 3 hbase.regionserver 0.0.0.0:60020 mapred.input.format.class org.apache.hadoop.hbase.mapred.TableInputFormat mapred.tasktracker.expiry.interval 600000 ipc.client.timeout 60000 keep.failed.task.files false https.keystore.info.rsrc sslinfo.xml hbase.hstore.compactionThreshold 3 ipc.client.tcpnodelay false mapred.task.profile.reduces 0-2 mapred.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec io.map.index.skip 0 mapred.working.dir hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 50001/user/root ipc.server.tcpnodelay false mapred.reducer.class com.evri.zeitgeist.hadoop.mapred.CellCount $CellCountReduce hbaseshell.jline.bell.enabled true dfs.default.chunk.view.size 32768 hadoop.logfile.size 10000000 dfs.datanode.du.pct 0.98f mapred.reduce.tasks.speculative.execution true fs.inmemory.size.mb 75 dfs.datanode.du.reserved 0 fs.checkpoint.period 3600 dfs.web.ugi webuser,webgroup mapred.jobtracker.completeuserjobs.maximum 100 dfs.df.interval 60000 hbase.regionserver.info.port 60030 dfs.data.dir ${hadoop.tmp.dir}/dfs/data fs.s3.maxRetries 4 dfs.datanode.dns.interface default mapred.mapoutput.value.class org.apache.hadoop.io.LongWritable hbase.regionserver.lease.period 30000 mapred.local.dir ${hadoop.tmp.dir}/mapred/local fs.hftp.impl org.apache.hadoop.dfs.HftpFileSystem dfs.permissions.supergroup supergroup mapred.mapper.class com.evri.zeitgeist.hadoop.mapred.CellCount $CellCountMap fs.trash.interval 0 fs.s3.sleepTimeSeconds 10 dfs.replication.min 1 mapred.submit.replication 10 mapred.map.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec mapred.tasktracker.dns.interface default dfs.namenode.decommission.interval 300 dfs.http.address 0.0.0.0:50070 hbase.master.lease.period 120000 mapred.job.tracker domU-12-31-39-00-25-C1.compute-1.internal:50002 dfs.heartbeat.interval 3 io.seqfile.sorter.recordlimit 1000000 hbase.regionserver.thread.splitcompactcheckfrequency 20000 dfs.name.dir ${hadoop.tmp.dir}/dfs/name dfs.datanode.http.address 0.0.0.0:50075 mapred.local.dir.minspacekill 0 dfs.replication.interval 3 io.sort.record.percent 0.05 fs.kfs.impl org.apache.hadoop.fs.kfs.KosmosFileSystem mapred.map.output.compression.type RECORD mapred.temp.dir ${hadoop.tmp.dir}/mapred/temp mapred.tasktracker.reduce.tasks.maximum 8 dfs.replication 3 hadoop.logfile.count 10 io.seqfile.compress.blocksize 1000000 hbase.regionserver.optionalcacheflushinterval 1800000 fs.s3.block.size 67108864 hbase.regionserver.globalMemcacheLimit 536870912 fs.hsftp.impl org.apache.hadoop.dfs.HsftpFileSystem mapred.task.tracker.http.address 0.0.0.0:50060 hbase.hbasemaster.maxregionopen 120000 mapred.reduce.parallel.copies 5 io.seqfile.lazydecompress true mapred.output.dir hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 50001/usr/cell-count io.sort.mb 100 ipc.client.connection.maxidletime 1000 hbase.hregion.max.filesize 268435456 mapred.compress.map.output false mapred.task.tracker.report.address 127.0.0.1:0 ipc.client.kill.max 10 ipc.client.connect.max.retries 10 hbase.regionserver.msginterval 3000 fs.s3.impl org.apache.hadoop.fs.s3.S3FileSystem mapred.input.dir hdfs://domU-12-31-39-00-25-C1.compute-1.internal: 50001/user/root/unknown_person mapred.job.tracker.http.address 0.0.0.0:50030 io.file.buffer.size 4096 io.serializations org.apache.hadoop.io.serializer.WritableSerialization hbase.hregion.memcache.block.multiplier 2 hbase.io.seqfile.compression.type NONE mapred.reduce.copy.backoff 300 mapred.task.profile false hbase.server.thread.wakefrequency 10000 dfs.replication.considerLoad true jobclient.output.filter FAILED mapred.tasktracker.map.tasks.maximum 8 io.compression.codecs org .apache .hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec fs.checkpoint.size 67108864 hbase.regionserver.dns.interface default hbase.regionserver.optionallogrollinterval 1800000 The Code -------- package com.evri.zeitgeist.hadoop.mapred; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.io.RowResult; import org.apache.hadoop.hbase.mapred.TableMap; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import com.evri.zeitgeist.hadoop.util.Utils; public class CellCount extends Configured implements Tool { public static class CellCountMap extends TableMap { @Override public void map(ImmutableBytesWritable key, RowResult row, OutputCollector collector, Reporter reporter) throws IOException { collector.collect(new Text(row.getRow()), new LongWritable(row.entrySet().size())); } } public static class CellCountReduce extends MapReduceBase implements Reducer { @Override public void reduce(Text key, Iterator results, OutputCollector collector, Reporter reporter) throws IOException { long total = 0; while (results.hasNext()) { total += results.next().get(); } collector.collect(key, new LongWritable(total)); } } @Override public int run(String[] args) throws Exception { if (args == null || args.length != 4) { System.err.println("ARGS: [table] [output dir] [mappers] [reducers]"); System.exit(1); } JobConf conf = new JobConf(getConf(), IndexCountWeekReport.class); conf.setJobName("CellCount " + args[0]); conf.setNumMapTasks(Integer.parseInt(args[2])); conf.setNumReduceTasks(Integer.parseInt(args[3])); FileOutputFormat.setOutputPath(conf, new Path(args[1] + "/ cell-count")); TableMap.initJob(args[0], "index_count: hit_data:", CellCountMap.class, Text.class, LongWritable.class, conf); conf.setReducerClass(CellCountReduce.class); JobClient.runJob(conf); return 0; } public static void main(String[] args) throws Exception { HBaseConfiguration conf = new HBaseConfiguration(); conf.set("hbase.master", Utils.getMyIp() + ":60000"); int result = ToolRunner.run(conf, new CellCount(), args); System.exit(result); } } hadoop-site.xml --------------- hadoop.tmp.dir /mnt/hadoop fs.default.name hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001 mapred.job.tracker domU-12-31-39-00-25-C1.compute-1.internal:50002 tasktracker.http.threads 80 mapred.tasktracker.map.tasks.maximum 8 mapred.tasktracker.reduce.tasks.maximum 8 mapred.output.compress false mapred.output.compression.type BLOCK dfs.client.block.write.retries 3 hbase-site.xml -------------- hbase.master domU-12-31-39-00-25-C1.compute-1.internal:60000 hbase.rootdir hdfs://domU-12-31-39-00-25-C1.compute-1.internal:50001/ hbase -- Alex Vollmer Evri -- Helping users make sense of the world's information Senior Software Engineer alex@evri.com