From user-return-39693-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Thu Oct 24 17:06:33 2013 Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E36AD10A2D for ; Thu, 24 Oct 2013 17:06:33 +0000 (UTC) Received: (qmail 74511 invoked by uid 500); 24 Oct 2013 17:06:28 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 74289 invoked by uid 500); 24 Oct 2013 17:06:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 74281 invoked by uid 99); 24 Oct 2013 17:06:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 17:06:27 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Oct 2013 17:06:23 +0000 Received: by mail-lb0-f176.google.com with SMTP id z5so2135150lbh.21 for ; Thu, 24 Oct 2013 10:06:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7zMOS1gmEQGSC1DwoK8guL4ZKXbOn+i5czprzS71UEA=; b=B/omjvB++DQcdhGXC33wtOlF3Gjnhpc7gsX+4kvNy5FufC7K9w5djjK/R1qjBcMhYl K1vxjnL+0VpqPuu2pavNs/oUopUR93gpgwasDWFMrdXWgN3dQKrM3m4FGa4fQMHU8yzB hbGn95xzueRxOqggVFEdzy1Zal8xCwS/DDYQMP0YBV6kEXQKQ0A9JZ4mhm1Cob4+55h7 lWPOgSoPTgkSJPsfVVKF9rw8erIBMVwG0FzXoMbzT4nm7l/PJUjqfyOvgjGp/y5oxASY C07J7DPUyMlDh0wnncNLKjK9eubx7EScnMXQN0MbF6aXAmEEPdSsPN2DlQxTbJcNQ4Af c/4g== MIME-Version: 1.0 X-Received: by 10.112.14.3 with SMTP id l3mr2483324lbc.27.1382634362253; Thu, 24 Oct 2013 10:06:02 -0700 (PDT) Received: by 10.112.129.40 with HTTP; Thu, 24 Oct 2013 10:06:02 -0700 (PDT) In-Reply-To: <1382633617.49613.YahooMailNeo@web190105.mail.sg3.yahoo.com> References: <1382633617.49613.YahooMailNeo@web190105.mail.sg3.yahoo.com> Date: Thu, 24 Oct 2013 10:06:02 -0700 Message-ID: Subject: Re: Add Columnsize Filter for Scan Operation From: Ted Yu To: "user@hbase.apache.org" , Dhaval Shah Content-Type: multipart/alternative; boundary=001a11c37a08043bd104e97fa61f X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37a08043bd104e97fa61f Content-Type: text/plain; charset=ISO-8859-1 For streaming responses, there is this JIRA: HBASE-8691 High-Throughput Streaming Scan API On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah wrote: > Jean, if we don't add setBatch to the scan, MR job does cause HBase to > crash due to OOME. We have run into this in the past as well. Basically the > problem is - Say I have a region server with 12GB of RAM and a row of size > 20GB (an extreme example, in practice, HBase runs out of memory way before > 20GB). If I query the entire row, HBase does not have enough memory to > hold/process it for the response. > > In practice, if your setCaching > 1, then the aggregate of all rows > growing too big can also cause the same issue. > > I think 1 way we can solve this issue is making the HBase server serve > responses in a streaming fashion somehow (not exactly sure about the > details on how this can work but if it has to hold the entire row in > memory, its going to be bound by HBase heap size) > > Regards, > Dhaval > > > ________________________________ > From: Jean-Marc Spaggiari > To: user > Sent: Thursday, 24 October 2013 12:37 PM > Subject: Re: Add Columnsize Filter for Scan Operation > > > If the MR crash because of the number of columns, then we have an issue > that we need to fix ;) Please open a JIRA provide details if you are facing > that. > > Thanks, > > JM > > > > 2013/10/24 John > > > @Jean-Marc: Sure, I can do that, but thats a little bit complicated > because > > the the rows has sometimes Millions of Columns and I have to handle them > > into different batches because otherwise hbase crashs. Maybe I will try > it > > later, but first I want to try the API version. It works okay so far, > but I > > want to improve it a little bit. > > > > @Ted: I try to modify it, but I have no idea how exactly do this. I've to > > count the number of columns in that filter (that works obviously with the > > count field). But there is no Method that is caleld after iterating over > > all elements, so I can not return the Drop ReturnCode in the > filterKeyValue > > Method because I did'nt know when it was the last one. Any ideas? > > > > regards > > > > > > 2013/10/24 Ted Yu > > > > > Please take a look > > > at > > src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java : > > > > > > * Simple filter that returns first N columns on row only. > > > > > > You can modify the filter to suit your needs. > > > > > > Cheers > > > > > > > > > On Thu, Oct 24, 2013 at 7:52 AM, John > > wrote: > > > > > > > Hi, > > > > > > > > I'm write currently a HBase Java programm which iterates over every > row > > > in > > > > a table. I have to modiy some rows if the column size (the amount of > > > > columns in this row) is bigger than 25000. > > > > > > > > Here is my sourcode: http://pastebin.com/njqG6ry6 > > > > > > > > Is there any way to add a Filter to the scan Operation and load only > > rows > > > > where the size is bigger than 25k? > > > > > > > > Currently I check the size at the client, but therefore I have to > load > > > > every row to the client site. It would be better if the wrong rows > > > already > > > > filtered at the "server" site. > > > > > > > > thanks > > > > > > > > John > > > > > > > > > > --001a11c37a08043bd104e97fa61f--