Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E707D10990 for ; Tue, 17 Sep 2013 02:11:18 +0000 (UTC) Received: (qmail 57255 invoked by uid 500); 17 Sep 2013 02:11:18 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 57194 invoked by uid 500); 17 Sep 2013 02:11:18 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 57185 invoked by uid 99); 17 Sep 2013 02:11:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 02:11:18 +0000 X-ASF-Spam-Status: No, hits=3.1 required=5.0 tests=HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aarongmldt@gmail.com designates 209.85.215.177 as permitted sender) Received: from [209.85.215.177] (HELO mail-ea0-f177.google.com) (209.85.215.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 02:11:11 +0000 Received: by mail-ea0-f177.google.com with SMTP id f15so2399664eak.36 for ; Mon, 16 Sep 2013 19:10:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=e5vde3W1t7tUk41rFAbXdvBzJA4bpiJ43CteIt1kbVM=; b=sM2GUvWzYNQwXdnk4diFhemFIx8tB3g1QsBG8bQ/x9e0G/uYMB3xI1XnYI68aOOkfO tflkba3FAymCU/sTOTJpQ8ET4+zs/rm8mDljdgMxewhCxQqnqZGV/wsZKmaR1y74I942 8+32b+FmMjdsQtzK03Tf7ctLekN/VsnMGi+HrzOvkb49lGvKYyV5VEv76jtnbQRuDBrd njgiDWdbS2O7GGU+396QqJ0Au1keruss0Am0U86R3eYE3Z/cNvoyUX6ETRgCcKSH4K3y 7J1Qd/156Vxcto8O57cCIsOoWqTpiLUCnQJT6o8VfkXdB9dOLrMVInqiB6W9W6iCXht4 rSvg== MIME-Version: 1.0 X-Received: by 10.14.246.11 with SMTP id p11mr48181326eer.9.1379383851200; Mon, 16 Sep 2013 19:10:51 -0700 (PDT) Received: by 10.14.4.6 with HTTP; Mon, 16 Sep 2013 19:10:51 -0700 (PDT) In-Reply-To: References: Date: Mon, 16 Sep 2013 22:10:51 -0400 Message-ID: Subject: Re: Using Hadoop's MulitpleInputs with AccumuloInputFormat in a MR job From: Aaron To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=001a1132ef6275d13704e68ad4f0 X-Virus-Checked: Checked by ClamAV on apache.org --001a1132ef6275d13704e68ad4f0 Content-Type: text/plain; charset=ISO-8859-1 Sorry about that, I should have clarified better. My original question did involve scanning one table. Our particular use case is that we ingest a number of txt files into one table (not to say we couldn't do multiple, we just went with one for now). After our ingest runs, we run some MR jobs on that table. One idea we had was to try and use Multiple Mappers (to do some simple joins between rows) on this table for some later on processing. As part of that MR job, we wanted to add some Iterators to the scans, cut down on the records returned prior to reducing. I need to look into how AccumuloInputFormat works, haven't done that yet...so take everything I say as just a stream of thoughts. I wonder if one way to look at this is to have AccumuloInputFormat "hold multiple scanners." Somehow linking RecordReaders to Scanners. Need to think that through more, but, mimic MulitpleInputs from Hadoop....MultipleAccumuoInputs..i need to look at the patches in ACCUMULO-391. Cheers, Aaron On Mon, Sep 16, 2013 at 9:06 PM, Corey Nolet wrote: > Adding to my previous response- when you say you are setting different > iterators on a scan are you referring to a single table with different > iterators? Are the sets of iterators tied to different ranges? The changes > we are making to the current InputFormat will still not allow different > iterators on a single table but the use case sounds interesting. > > > On Mon, Sep 16, 2013 at 3:55 PM, Corey Nolet wrote: > >> Aaron, >> >> We are currently re-working the AccumuloInputFormat for Accumulo 1.6 to >> provide inputs from multiple tables (each with their own set of configured >> iterators, ranges, columns). Check out ACCUMULO-391. >> >> >> >> >> On Mon, Sep 16, 2013 at 11:41 AM, Aaron wrote: >> >>> I was curious if this is possible (i am thinking it isn't): from the >>> Java API, Accumulo 1.5, Hadoop 1.2.1 >>> >>> Want to set 2 different iterators on a scan, and send those results to 2 >>> different Mappers. >>> >>> So, how'd i do this with files as inputs, is just to use MultipleInputs >>> class, with 2 different Path, and 2 different Mapper Classes, maybe the >>> same InputFormat (e.g Text or Sequence) >>> >>> Since I'm using AccumulInputFormat, I would think I'd be ok..maybe with >>> a null Path in the MulitpleInputs.addInputPath(), but it's the static >>> addIterator() on the AccumuloInputFormat that I think is where I lose. >>> >>> Can I have 2 different AccumuloInputFormats, with different iterators? >>> I think the answer is no, and briefly looking at the source, believe that >>> to be correct..but, was curious if others have done have done something. >>> >>> Cheers, >>> Aaron >>> >> >> > --001a1132ef6275d13704e68ad4f0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Sorry about that, I should have clarified better. =A0My or= iginal question did involve scanning one table. =A0Our particular use case = is that we ingest a number of txt files into one table (not to say we could= n't do multiple, we just went with one for now). =A0After our ingest ru= ns, we run some MR jobs on that table. =A0One idea we had was to try and us= e Multiple Mappers (to do some simple joins between rows) on this table for= some later on processing. =A0As part of that MR job, we wanted to add some= Iterators to the scans, cut down on the records returned prior to reducing= .

I need to look into how AccumuloInputFormat works, haven'= ;t done that yet...so take everything I say as just a stream of thoughts. = =A0I wonder if one way to look at this is to have AccumuloInputFormat "= ;hold multiple scanners." =A0Somehow linking RecordReaders to Scanners= . =A0Need to think that through more, but, mimic MulitpleInputs from Hadoop= ....MultipleAccumuoInputs..i need to look at the patches in ACCUMULO-391. = =A0

Cheers,
Aaron






On Mon, Sep 16, 2013 at 9:06 PM, Corey Nolet <cjnolet= @gmail.com> wrote:
Adding to my previous respo= nse- when you say you are setting different iterators on a scan are you ref= erring to a single table with different iterators? Are the sets of iterator= s tied to different ranges? The changes we are making to the current InputF= ormat will still not allow different iterators on a single table but the us= e case sounds interesting.=A0


On Mon, Sep 1= 6, 2013 at 3:55 PM, Corey Nolet <cjnolet@gmail.com> wrote:
Aaron,

We are currently re-working the A= ccumuloInputFormat for Accumulo 1.6 to provide inputs from multiple tables = (each with their own set of configured iterators, ranges, columns). Check o= ut ACCUMULO-391.




On Mon, Sep 16, 2013 at 11:41 AM, Aaron <aarongmldt@gmail.co= m> wrote:
I was curious if this is po= ssible (i am thinking it isn't): =A0from the Java API, Accumulo 1.5, Ha= doop 1.2.1

Want to set 2 different iterators on a scan, and send those = results to 2 different Mappers.

So, how'd i do this with files as inputs, is just t= o use MultipleInputs class, with 2 different Path, and 2 different Mapper C= lasses, maybe the same InputFormat (e.g Text or Sequence)

Since I'm using AccumulInputFormat, I would think I'd be= ok..maybe with a null Path in the MulitpleInputs.addInputPath(), but it= 9;s the static addIterator() on the AccumuloInputFormat that I think is whe= re I lose.

Can I have 2 different AccumuloInputFormats, with diffe= rent iterators? =A0I think the answer is no, and briefly looking at the sou= rce, believe that to be correct..but, was curious if others have done have = done something.

Cheers,
Aaron



--001a1132ef6275d13704e68ad4f0--