Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87380783A for ; Thu, 1 Dec 2011 16:25:31 +0000 (UTC) Received: (qmail 26568 invoked by uid 500); 1 Dec 2011 16:25:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 26527 invoked by uid 500); 1 Dec 2011 16:25:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 26519 invoked by uid 99); 1 Dec 2011 16:25:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 16:25:29 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of lars.george@gmail.com designates 209.85.210.169 as permitted sender) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 16:25:20 +0000 Received: by iaqq3 with SMTP id q3so3718753iaq.14 for ; Thu, 01 Dec 2011 08:24:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=tnNyqwrWxPR2jgCj+YgXLbfv98yrS3ix4TyZa1UzRxU=; b=ZCibzp9d/9L5pG3ge0fmDapBsGaXttwue1pOzTybsSng27C022KAsoJKcd1uW2qo/L BxPP0hJo77lXA5q0wi9R6a4UorZc3/reJWAxQrHEySnUCPzMpePOGdHwtVC7buJRG2Yt ACJp1Bvb/Gk+qU0MP7ktO6fNKs0sUGNXMrYbM= Received: by 10.42.147.72 with SMTP id m8mr9547491icv.56.1322756699478; Thu, 01 Dec 2011 08:24:59 -0800 (PST) Received: from [10.0.0.35] (p4FEBD70D.dip.t-dialin.net. [79.235.215.13]) by mx.google.com with ESMTPS id dm1sm12531694igb.6.2011.12.01.08.24.56 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 01 Dec 2011 08:24:58 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1251.1) Subject: Re: Constant error when putting large data into HBase From: Lars George In-Reply-To: Date: Thu, 1 Dec 2011 17:24:51 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <98CD2427-F4AC-46F2-8C90-B8E7E0B507EF@gmail.com> References: <5B0FEF87-14D5-42F4-AFBC-D6213A99A299@gmail.com> <79DF7533-927B-4D30-9AB3-3FE3A1B8CB25@gmail.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1251.1) X-Virus-Checked: Checked by ClamAV on apache.org HI Ed, There is not much you can do in the HBase side, too much is simply too = much. I have in the past lowered the number of slots per MR node, so = that fewer threads are hitting HBase. Sorry that I misread the already = hashed keys, yeah, then all you can try is the bulk loading, as it will = give you much better performance in a bulk loading scenario. If you have = to trickle data in, then this will not help. But if you have a job that = needs to complete and part of that job is to insert something into = HBase, you could as well output to HFiles and then bulk load them in = (which is very fast). Lars On Dec 1, 2011, at 2:58 PM, edward choi wrote: > Thanks Lars, > I am already familiar with the sequential key problem. > That is why I am using hash generated random string as the document = id. > But I guess I was still pushing the cluster too much. >=20 > Maybe I am inserting tweet documents too fast? > Since a single tweet is only 140 bytes, puts are performed really = fast. > So I am guessing maybe random keys alone are not cutting it..? >=20 > I am counting 20,000 requests per region when I perform mapreduce = loading. > Is that too much to handle? >=20 > Is there a way to deliberately slow down input process? > I am reading from 21 node HDFS cluster and writing to 21 node HBase > cluster, so the process speed and the sheer volume of data transaction = is > enormous. > Can I set a limit to the request per region? Say, like 5000 request = maximum? > I really want to know just how far I can push Hbase. > But I guess developers would say everything depends on the use case. >=20 > I thought about using bulk loading feature but I kinda got lazy and = just > went with the random string rowid. > If parameter meddling doesn't pan out, I'll have no choice but to try > bulk-loading feature. >=20 > Thanks for the reply. >=20 > Regards, > Ed >=20 >=20 > 2011/12/1 Lars George >=20 >> Hi Ed, >>=20 >> Without having looked at the logs, this sounds like the common case = of >> overloading a single region due to your sequential row keys. Either = hash >> the keys, or salt them - but the best bet here is to use the bulk = loading >> feature of HBase (http://hbase.apache.org/bulk-loads.html). That = bypasses >> this problem and lets you continue to use sequential keys. >>=20 >> Lars >>=20 >>=20 >> On Dec 1, 2011, at 12:21 PM, edward choi wrote: >>=20 >>> Hi Lars, >>>=20 >>> Okay here goes some details. >>> There are 21 tasktrackers/datanodes/regionservers >>> There is one Jobtracker/namenode/master >>> Three zookeepers. >>>=20 >>> There are about 200 million tweets in Hbase. >>> My mapreduce code is to aggregate tweets by their generated date. >>> So in the map stage, I write out tweet date as the key, and document = id >> as >>> the value (document id is randomly generated by hash algorithm) >>> In the reduce stage, I put the data into a table. The key(which is = the >>> tweet date) is the table rowid, and values(which are document id's) = as >> the >>> column values. >>>=20 >>> Now, map stage is fine. I get to 100% map. But during reduce stage, = one >> of >>> my regionserver fails. >>> I don't know what the exact symptom is. I just get: >>>> = org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: >>> Failed >>>> 1 action: servers with issues: lp171.etri.re.kr:60020, >>>=20 >>> About "some node always die" <=3D=3D scratch this. >>>=20 >>> To be precise, >>> I narrowed down the range of data that I wanted to process. >>> I tried to put tweets that was generated only at 2011/11/22. >>> Now the reduce code will produce a row with "20111122" as the rowid, = and >> a >>> bunch of document id's as the column value. (I use 32byte string as = the >>> document id. I append 1000 document id for a single column) >>> So the region that my data will be inserted will have "20111122" = between >>> the Start Key and End Key. >>> The regionserver that contains that specific region fails. That is = the >>> point. If I move that region to another regionserver using hbase = shell, >>> then that regionserver fails. >>> With the same log output. >>> After 4 failures, the job is force-cancelled and the put operation = was >> not >>> done. >>>=20 >>> Now, even with the failure, the regionserver is still online. It is = not >>> dead(sorry for my use of word 'die'). >>>=20 >>> I have pasted Jobtracker log, tasktracker(one that failed) log, >>> regionserver(one that failed) log using PasteBin. >>> The job started at 2011-12-01 17:14:43 and was killed at 2011-12-01 >>> 17:20:07. >>>=20 >>> JobTracker Log >>>