Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 97515 invoked from network); 23 Jun 2010 16:40:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Jun 2010 16:40:44 -0000 Received: (qmail 4378 invoked by uid 500); 23 Jun 2010 16:40:43 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 4340 invoked by uid 500); 23 Jun 2010 16:40:43 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 4332 invoked by uid 99); 23 Jun 2010 16:40:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jun 2010 16:40:42 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=AWL,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jun 2010 16:40:36 +0000 Received: by qyk2 with SMTP id 2so321277qyk.14 for ; Wed, 23 Jun 2010 09:40:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=HJXUKt4VP66Nl/B74zFwLdUwnbQ/JLAGG+cRKlfhhiQ=; b=a1pElz2ejNy1xU4bBZXavWPX1WqpWDNSVigKMbv/r5ahxXcnJ22TH8F9QjierffBSE IjMnhBivchNmJTSCDQP5Wh1BYyRhsva5IEq7xGmJILasKsaToUKFBacl+66XZSjL4Cv5 zgo0MGate63G4+68b3fBr7tREF5HjN0NyuyqY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=MQ0r/1tzpk6XPOsH7bl7PfwnvQ4v6zDfdxMDoDWDlUSsYpUoC+2Tsa4UcDvNnx1Hyj 66a2VJd47CG62DcfNpwCYhCG5MZ8K8QiZzIfEO98PFSPXpQNfOOr1k8Rlbs4YumBlMwX jI2ZVX3qUPfgoM30NR/Swiq9pye+RQ36NyxKQ= MIME-Version: 1.0 Received: by 10.224.87.215 with SMTP id x23mr5172130qal.29.1277311215016; Wed, 23 Jun 2010 09:40:15 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.229.221.205 with HTTP; Wed, 23 Jun 2010 09:40:14 -0700 (PDT) In-Reply-To: References: Date: Wed, 23 Jun 2010 09:40:14 -0700 X-Google-Sender-Auth: C6M4_f2izAvXAWyIIy2S5mUB5Mo Message-ID: Subject: Re: multiple reads from a Map - optimization question From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I'm still confused by 2 things: - Are the Gets done on the same row that is mapped? Or on the same table? Or another table? - Can you give a real example of what you are trying to achieve? (with fake data) Thx J-D On Tue, Jun 22, 2010 at 10:51 AM, Raghava Mutharaju wrote: >>>> This is not super clear, some comments inline. > I will try & explain better this time. > > The overall objective -- from the complete dataset, obtain a subset of it= to > work on. Now this subset would be obtained by making use of the 2-3 > conditions (filters). The setting up of one filter depends on the output = of > the previous filter. It is as follows > > Filter-1: Setup with the scan that is used for the map. > Filter-2: From the row that is coming into the map, extract some fields a= nd > create a ColumnFilter/ValueFilter out of it. Row would be a delimited set= of > values. > Filter-3: Apply filter-2 and from its output, extract the required fields > and do some processing. Then write it back to HBase table. > > Filters-2, 3 are used within the map. So I am using 1-2 Gets per row that > map receives. I cannot apply all the filters beforehand because the > subsequent filters have to be created based on previous filter's output. > > Yes, there would be more data. But currently, I am testing on data which > occupied only a single region. So only 1 map would be running on the clus= ter > and it is taking in all the data. > > This approach is slow and it shows in the results. Is there anyway, in wh= ich > this can be achieved with much improved performance? > > Thank you. > > Regards, > Raghava. > > On Tue, Jun 22, 2010 at 12:57 PM, Jean-Daniel Cryans wrote: > >> This is not super clear, some comments inline. >> >> J-D >> >> On Tue, Jun 22, 2010 at 12:49 AM, Raghava Mutharaju >> wrote: >> > Hello all, >> > >> > =A0 =A0 =A0In the data, I have to check for multiple conditions and th= en work >> > with the data that satisfies all the conditions. I am doing this as an= MR >> > job with no reduce and the conditions are translated to a set of filte= rs. >> > Among the multiple conditions (2 or 3 max), data that satisfies one of >> them >> > would come as input to the Map (initial filter is set in the scan to t= he >> > mappers). Now, from among the dataset that comes through to each map, = I >> > would check for other conditions (1 or 2 remaining conditions). Since >> map() >> > is called for each row of data, it would mean 1 or 2 read calls (with >> > filter) to HBase tables. This setup, even for small data (data would f= it >> in >> >> Here you talk about checking 1-2 two conditions... are they checked on >> the row that was mapped? Else that means that you are doing 1-2 Get >> per row? If so, this is definitely going to be slow! >> >> > a region and so only 1 map is taking in all the data) is very slow. >> >> What do you mean? That currently your test is done on 1 region but you >> expect more? If not, then don't use MR since that would give you >> nothing more than more code to write and more processing time. >> >> > >> > Here, note that, I shouldn't be filtering the incoming data to map but >> based >> > on that data, next set of filtering conditions would be formed. >> >> Can you give an example? >> >> > >> > Can this be improved? Would constructing secondary indexes help (would >> need >> > a dramatic improvement actually)? Or is this type of problem not suita= ble >> > for HBase? >> > >> > Thank you. >> > >> > Regards, >> > Raghava. >> > >> >