Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 19152 invoked from network); 21 Oct 2009 15:43:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Oct 2009 15:43:43 -0000 Received: (qmail 14212 invoked by uid 500); 21 Oct 2009 15:43:42 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 14164 invoked by uid 500); 21 Oct 2009 15:43:42 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 14154 invoked by uid 99); 21 Oct 2009 15:43:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2009 15:43:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.221.187 as permitted sender) Received: from [209.85.221.187] (HELO mail-qy0-f187.google.com) (209.85.221.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Oct 2009 15:43:32 +0000 Received: by qyk17 with SMTP id 17so5919005qyk.2 for ; Wed, 21 Oct 2009 08:43:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=FlPpE61b/u8IQaRpC7s5cpRCUws6h0IfU+2mq80APE4=; b=rDLpkJBsniDlMGGz2gCxNQqiV2xkVl1A0jP6Ry09yIhnZ+K5SV6ioqVBY9sSIKnW/A nbrImqR2ufaEBwx5j9RdxTsZ5CEUwWqJVPbVE8euIdluvk43YUyp8MoOC5ldr8/uEcF+ 9nRj81nZrNGlyyTJK/gDZwd8VSPauVl7pWDSY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=MuW8JUur1GEqUurQ2wCQuzAqrPccsBUQmOHTR4tGhuN/BFppaPfDVy/cFY6pRF5ckk iATvcRZO7YqPHBXWKvktoOYkaerqURttVTC5y5p4GahpWyxksngyZMpiRH0Jyyr1X9SN 0t7hqeQ0x58MbTVNSxebsnAmhu9mRI6DTc+EM= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.69.83 with SMTP id y19mr1147688qci.50.1256139791054; Wed, 21 Oct 2009 08:43:11 -0700 (PDT) In-Reply-To: <5D66A842901F8E41815AF6D27A28EC490A84DF4229@Mail-Ab02.rmg-ny.com> References: <5D66A842901F8E41815AF6D27A28EC490A84DF41A0@Mail-Ab02.rmg-ny.com> <31a243e70910210755h2f5bc6e6ib504c515b0006272@mail.gmail.com> <5D66A842901F8E41815AF6D27A28EC490A84DF41BE@Mail-Ab02.rmg-ny.com> <31a243e70910210804u59d7efb8p920490ab8f564986@mail.gmail.com> <5D66A842901F8E41815AF6D27A28EC490A84DF4229@Mail-Ab02.rmg-ny.com> Date: Wed, 21 Oct 2009 08:43:10 -0700 X-Google-Sender-Auth: 9fc1bc206ede7dfd Message-ID: <7c962aed0910210843s201651c8i55ddd2f268f62cb8@mail.gmail.com> Subject: Re: Table Upload Optimization From: stack To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00032557f4ba08b9a6047673d63a X-Virus-Checked: Checked by ClamAV on apache.org --00032557f4ba08b9a6047673d63a Content-Type: text/plain; charset=ISO-8859-1 On Wed, Oct 21, 2009 at 8:22 AM, Mark Vigeant wrote: > Ok, so first in response to St. Ack, nothing fishy appears to be happening > in the logs: data is being written to all regionservesrs. > > And it's not hovering around 100% done, it just has sent about 118 map > jobs, or "Task attempts" > > I saw this in your first posting: 10/21/09 10:22:52 INFO mapred.JobClient: map 100% reduce 0%. Is your job writing hbase in the map task or in reducer? Are you using TableOutputFormat? > I'm using Hadoop 0.20.1 and HBase 0.20.0 > > Each node is a virtual machine with 2 CPU, 4 GB host memory and 100 GB > storage. > > You are running DN, TT, HBase, and ZK on above? One disk shared by all? > I don't know what you meant by slots per TT... > Children running at any one time on a TaskTracker. You should start with one only since you have such an anemic platform. > > And the heapsize is the default of 1000 MB. That is probably a huge > problem, now that I think about it, heh. > > And there is absolutely no special configuration that I'm using. I have > Hbase running my zookeeper quorum on 2 machines, but that's about it. > You've upped filedescriptors and xceivers, all the stuff in 'Getting Started'? St>Ack > > -----Original Message----- > From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of > Jean-Daniel Cryans > Sent: Wednesday, October 21, 2009 11:04 AM > To: hbase-user@hadoop.apache.org > Subject: Re: Table Upload Optimization > > Well the XMLStreamingInputFormat lets you map XML files which is neat > but it has a problem and always needs to be patched. I wondered if > that was missing but in your case it's not the problem. > > Did you check the logs of the master and region servers? Also I'd like to > know > > - Version of Hadoop and HBase > - Nodes's hardware > - How many map slots per TT > - HBASE_HEAPSIZE from conf/hbase-env.sh > - Special configuration you use > > Thx, > > J-D > > On Wed, Oct 21, 2009 at 7:57 AM, Mark Vigeant > wrote: > > No. Should I? > > > > -----Original Message----- > > From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of > Jean-Daniel Cryans > > Sent: Wednesday, October 21, 2009 10:55 AM > > To: hbase-user@hadoop.apache.org > > Subject: Re: Table Upload Optimization > > > > Are you using the Hadoop Streaming API? > > > > J-D > > > > On Wed, Oct 21, 2009 at 7:52 AM, Mark Vigeant > > wrote: > >> Hey > >> > >> So I want to upload a lot of XML data into an HTable. I have a class > that successfully maps up to about 500 MB of data or so (on one > regionserver) into a table, but if I go for much bigger than that it takes > forever and eventually just stops. I tried uploading a big XML file into my > 4 regionserver cluster (about 7 GB) and it's been a day and it's still going > at it. > >> > >> What I get when I run the job on the 4 node cluster is: > >> 10/21/09 10:22:35 INFO mapred.LocalJobRunner: > >> 10/21/09 10:22:38 INFO mapred.LocalJobRunner: > >> (then it does that for a while until...) > >> 10/21/09 10:22:52 INFO mapred.TaskRunner: Task > attempt_local_0001_m_000117_0 is done. And is in the process of committing > >> 10/21/09 10:22:52 INFO mapred.LocalJobRunner: > >> 10/21/09 10:22:52 mapred.TaskRunner: Task > 'attempt_local_0001_m_000117_0' is done. > >> 10/21/09 10:22:52 INFO mapred.JobClient: map 100% reduce 0% > >> 10/21/09 10:22:58 INFO mapred.LocalJobRunner: > >> 10/21/09 10:22:59 INFO mapred.JobClient: map 99% reduce 0% > >> > >> > >> I'm convinced I'm not configuring hbase or hadoop correctly. Any > suggestions? > >> > >> Mark Vigeant > >> RiskMetrics Group, Inc. > >> > > > --00032557f4ba08b9a6047673d63a--