Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 53314EF0F for ; Tue, 22 Jan 2013 23:11:16 +0000 (UTC) Received: (qmail 7771 invoked by uid 500); 22 Jan 2013 23:11:14 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 7724 invoked by uid 500); 22 Jan 2013 23:11:14 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 7714 invoked by uid 99); 22 Jan 2013 23:11:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2013 23:11:14 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [106.10.151.118] (HELO nm26-vm7.bullet.mail.sg3.yahoo.com) (106.10.151.118) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jan 2013 23:11:05 +0000 Received: from [106.10.166.120] by nm26.bullet.mail.sg3.yahoo.com with NNFMP; 22 Jan 2013 23:10:43 -0000 Received: from [106.10.150.28] by tm9.bullet.mail.sg3.yahoo.com with NNFMP; 22 Jan 2013 23:10:43 -0000 Received: from [127.0.0.1] by omp1029.mail.sg3.yahoo.com with NNFMP; 22 Jan 2013 23:10:43 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 313060.96366.bm@omp1029.mail.sg3.yahoo.com Received: (qmail 65073 invoked by uid 60001); 22 Jan 2013 23:10:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s1024; t=1358896243; bh=48AKlggN9RGKBbNq9pIz5SsCGQFxmh1KTfCK2DLyzSM=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=nYKx3lexaSAKJ/pLTHfGDRf9TX08D7tFfz9qJaoCVGnfOqwyXevB3bjHCsNZIh8VR5mys0gZUgibxjEDzeP9hwmqcJXsDavW0dVjOwBi+VRktaMVpAmLML/qtEFSZ1LlkjUjwj8R6Y6YJomhKD3bhxSlwvV9YnIgxGLgh8X+hM8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.in; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=i+y1qRtlSRAacRcXd0gFE7fb3aR+dOfPeOMGLs7SDIYGCsW9+EEw9B6q0ifRZNDzk8NadK0rNUd91MvfaW9nHmKN5x1tkF/1/Nmlly+LY1MXG9aL6VPV4sEpBHoIcBqb57l0/fHb4RzpHhCR8dYIfOiia+lWNgt/HRf6NnSqsG4=; X-YMail-OSG: NRnrX.gVM1nG9GQsFg_gepmpd_zIIiiD9apMMjrR_z5KDL1 BoiS2KAhXAHT1BTs8gelE_oK3WtLzO82rQMGGIV69n8w3pD2OmqubJMumuRb U6jdEGWpKAmm_BZLahdwYwpbmyBdyNsQ1qJalGvlQXGpSPf7k6i71LgfCpuV 3D75_ZSiVTYSKEkS20r2uRETUmBTsE39pm15RNWLGfQA.il58eV8yQRfcE_d .4w3s6NsAKZywWla9tqvsXnZRJBcmov66CzhtuAVRrh4pYxmj84bmJdrqr0A r7kgc1xY5NIPOVlsx4WyHD4TYTH9g3qqLmth_ZAihe3SHdzLd3Zg9sntQLGM Jf30XYApYWp1YftF67zY79BqOUdUeu56J6kt7J4vxIpTgU7BozKQS2xvPLIP AS2NymQRvudVZiRgHxSiVwuJn.OtvTAxrHAIzTVvRTGQ20bZuHdPTiQLunlN FjsVea5CYSrphZKlURgYsRVn30uj_2fX5xNLjhBW18c4eiGWa7bKLZNpV9oj YVgijaA-- Received: from [207.38.188.249] by web192906.mail.sg3.yahoo.com via HTTP; Wed, 23 Jan 2013 07:10:43 SGT X-Rocket-MIMEInfo: 001.001,CkhpIERhdmlkLi4gV2Ugc3VjY2Vzc2Z1bGx5IHVzZSB0aGUgImxvZ2ljYWwiIHNjaGVtYSBhcHByb2FjaCBhbmQgaGF2ZSBub3Qgc2VlbiBpc3N1ZXMgeWV0Li4gT2Zjb3Vyc2UgaXQgYWxsIGRlcGVuZHMgb24gdGhlIHVzZSBjYXNlIGFuZCBzYXlpbmcgaXQgd291bGQgd29yayBmb3IgeW91IGJlY2F1c2UgaXQgd29ya3MgZm9yIHVzIHdvdWxkIGJlIG5haXZlLi4gSG93ZXZlciwgaWYgaXQgZG9lcyB3b3JrLCBpdCB3aWxsIG1ha2UgeW91ciBsaWZlIG11Y2ggZWFzaWVyIGJlY2F1c2Ugd2l0aCBhIGxvZ2ljYWwBMAEBAQE- X-Mailer: YahooMailWebService/0.8.130.494 References: <1357493340.78254.BPMail_high_noncarrier@web192901.mail.sg3.yahoo.com> Message-ID: <1358896243.60271.YahooMailNeo@web192906.mail.sg3.yahoo.com> Date: Wed, 23 Jan 2013 07:10:43 +0800 (SGT) From: Dhaval Shah Reply-To: Dhaval Shah Subject: Re: Controlling TableMapReduceUtil table split points To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1395031902-291880844-1358896243=:60271" X-Virus-Checked: Checked by ClamAV on apache.org ---1395031902-291880844-1358896243=:60271 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable =0AHi David.. We successfully use the "logical" schema approach and have no= t seen issues yet.. Ofcourse it all depends on the use case and saying it w= ould work for you because it works for us would be naive.. However, if it d= oes work, it will make your life much easier because with a logical schema = other problems become simpler (like you can be sure that 1 map function wil= l process an entire row rather than a row going to multiple mappers, or if = you are using filters that restrict queries to only a small subset of the d= ata, even setBatch won't be needed for those use cases).. I did run into is= sues where I did not use setBatch and my mappers ran out of memory but that= was a simpler one to solve (and by the way if you are on CDH4, the HBase e= xport utility also does not use setBatch and your mapper will run out of me= mory if you have a large row.. Its easy to put that line in though as a con= fig param and this feature is available in future releases of HBase trunk)=0A=0ARegards,=0ADhaval=0A =0A=0A________________________________=0A= From: David Koch =0ATo: user@hbase.apache.org =0ASe= nt: Sunday, 6 January 2013 12:53 PM=0ASubject: Re: Controlling TableMapRedu= ceUtil table split points=0A =0AHi Dhaval,=0A=0AGood call on the setBatch.= I had forgotten about it. Just like changing the=0Aschema it would involve= changing the map(...) to reflect the fact that only=0Apart of the user's d= ata is returned in each call but I would not have to=0Amanipulate table spl= its.=0A=0AThe HBase book does suggest that it's bad practice to use the "lo= gical"=0Aschema of lumping all user data into a single row(*) but I'll do s= ome=0Atesting to see what works.=0A=0AThank you,=0A=0A/David=0A=0A(*) Chapt= er 9, section "Tall-Narrow Versus Flat-Wide Tables", 3rd ed., page=0A359)= =0A=0A=0AOn Sun, Jan 6, 2013 at 6:29 PM, Dhaval Shah wrote:=0A=0A> Another option to avoid the timeout/oome issues is to = use scan.setBatch()=0A> so that the scanner would function normally for sma= ll rows but would break=0A> up large rows in multiple Result objects which = you can now use in=0A> conjunction with scan.setCaching() to control how mu= ch data you get back..=0A>=0A> This approach would not need a change in you= r schema design and would=0A> ensure that only 1 mapper processes the entir= e row (but in multiple calls=0A> to the map function)=0A> ---1395031902-291880844-1358896243=:60271--