Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 25713 invoked from network); 21 Dec 2010 13:58:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Dec 2010 13:58:41 -0000 Received: (qmail 35833 invoked by uid 500); 21 Dec 2010 13:58:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 35721 invoked by uid 500); 21 Dec 2010 13:58:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 35713 invoked by uid 99); 21 Dec 2010 13:58:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 13:58:39 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lars.george@gmail.com designates 209.85.215.46 as permitted sender) Received: from [209.85.215.46] (HELO mail-ew0-f46.google.com) (209.85.215.46) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Dec 2010 13:58:34 +0000 Received: by ewy5 with SMTP id 5so2026494ewy.5 for ; Tue, 21 Dec 2010 05:58:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=IFZfncV8Qjj5MvKZp5RQtSZyQmAXlJvX5DbZXUIBpsA=; b=MnlOnr60lSGoMTnPIRmsZsHeKYBP6PnwjjPZAmagH9hxi+gsqNvmjQ1GngAm9gxpHZ JJTWUz5459sPFzklRkCaH7fYxgzCVuMV/QQeb54Q3aX8OXcPJKtdjsX+UxPeqisYe+IQ TID+apf3qRinjfyjwKaaTuVMFhKbTkpuLRp7c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xNxQAmXL/wRd3eUINjZVYoGNcZ43gBmM6ikgV96oNilMoPczQzpTRAkWnBJ1CZJ5LS lvY4lYsUqmuG8HQnSOyg5BEr6Q3dsBPWgDFI/eXQxW1iQJh5J/5los18/ih6ZEuZBvHi qgwQ/C7OdZg0mz3Do3vJjoXJjIKzJYX9CQYj0= MIME-Version: 1.0 Received: by 10.213.108.72 with SMTP id e8mr5913882ebp.70.1292939892795; Tue, 21 Dec 2010 05:58:12 -0800 (PST) Received: by 10.213.35.140 with HTTP; Tue, 21 Dec 2010 05:58:12 -0800 (PST) In-Reply-To: References: Date: Tue, 21 Dec 2010 05:58:12 -0800 Message-ID: Subject: Re: Slow MR data load to table From: Lars George To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Bradford, I heard this before recently and one of the things that bit the person in question in the butt was swapping. Could you check that all machines are positively healthy and not swapping etc. - just to rule out the (not so) obvious stuff. Lars On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens wrote: > Aaaand, LZO is not enabled. > > On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens > wrote: >> FYI, here is the hbase-site: http://pastebin.com/z9aqy3dQ >> >> Also, in hbase-env: >> >> export HBASE_OPTS=3D"-XX:+HeapDumpOnOutOfMemoryError >> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" >> >> Hrm, that seems suboptimal.... >> >> On Mon, Dec 20, 2010 at 7:55 PM, Bradford Stephens >> wrote: >>> Greetings HBase Homies, >>> >>> I'm running the .89 dev release (though I had this problem in .20.6 as >>> well). =A0Trying to load 10 x 8.5 CSV files from HDFS into an empty >>> HBase table. >>> >>> Getting pretty slow loads ... 85,000 records/minute/node. I'd expect >>> this to be at least 5x faster based on past experience. Cluster has 5 >>> RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting >>> "Failed to report status for 601 seconds. Killing!" on maptasks. WAL >>> is disabled. >>> >>> What's odd is, I could have sworn it used to be *much* faster last >>> week. I don't remember the code changing. Could it be environmental? >>> top isn't displaying anything interesting. >>> >>> The schema is pretty simple. Each record is maybe 1k: >>> id_set:id, id_set:mid, id_set:aguid, id_set:sid >>> metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, metadata= :type >>> event:event >>> data_set:ts, data_set:data, data_set:geo >>> >>> The code is simple (didn't write it): >>> (Main): http://pastebin.com/vmPgeqNj >>> (Mapper): http://pastebin.com/T2BQjs0k >>> >>> The logs are quite boring: >>> HMaster: http://pastebin.com/zvyvNc3k >>> Reigonserver: http://pastebin.com/QvJ4J7Ps >>> >>> >>> Any ideas? >>> >>> -- >>> Bradford Stephens, >>> Founder, Drawn to Scale >>> drawntoscalehq.com >>> 727.697.7528 >>> >>> http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data >>> solution. Process, store, query, search, and serve all your data. >>> >>> http://www.roadtofailure.com -- The Fringes of Scalability, Social >>> Media, and Computer Science >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com --=A0 The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >