Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DAE47D3E5 for ; Sat, 2 Mar 2013 21:05:11 +0000 (UTC) Received: (qmail 2171 invoked by uid 500); 2 Mar 2013 21:05:11 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 2140 invoked by uid 500); 2 Mar 2013 21:05:11 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 2132 invoked by uid 99); 2 Mar 2013 21:05:11 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Mar 2013 21:05:11 +0000 Received: from localhost (HELO mail-oa0-f48.google.com) (127.0.0.1) (smtp-auth username billie, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Mar 2013 21:05:11 +0000 Received: by mail-oa0-f48.google.com with SMTP id j1so7433342oag.21 for ; Sat, 02 Mar 2013 13:05:10 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.182.51.98 with SMTP id j2mr12054424obo.77.1362258310405; Sat, 02 Mar 2013 13:05:10 -0800 (PST) Received: by 10.76.121.68 with HTTP; Sat, 2 Mar 2013 13:05:10 -0800 (PST) In-Reply-To: References: Date: Sat, 2 Mar 2013 13:05:10 -0800 Message-ID: Subject: Re: Reset column iterator while using AccumuloRowInputFormat From: Billie Rinaldi To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=f46d04448157af2b4804d6f77a24 --f46d04448157af2b4804d6f77a24 Content-Type: text/plain; charset=ISO-8859-1 On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo wrote: > Is there a way to "reset" the column iterator back to the "beginning" when > using the AccumuloRowInputFormat? We have a case in which we need to > iterate over the columns for a row at least twice and it could be a large > row that may not fit in memory. > > I think we can work around this by having a separate scanner used within > the map method for this purpose. Other than that, is there a way to clone > or copy or reset the column iterator such that we can iterate over it more > than once? > Currently, no. It's not immediately obvious how we could change the InputFormat to accomplish this. The RecordReader creates a scanner, does the seeking/fetching for the InputSplit once in its initialize method, then iterates over the scanner, grouping together rows as appropriate. Going back to the beginning of a row would require us to seek the scanner again, and replace the old iterator with a new one. We could make a special RecordReader with a reset method, but I don't know how we could call the method. Interactions with the RecordReader are handled by the MapContext, and I don't know if you can use a custom MapContext. Maybe we could have an InputFormat that gives you a Scanner directly that you could reseek in the Mapper, but we'd have to spend some time thinking about it to make sure it would work. Billie > Thanks, > > Mike > > public void map(Text key, PeekingIterator> > columnIterator, Context context) { > while (columnIterator.hasNext()) { > Map.Entry kv = columnIterator.next(); > } > > * // reset column iterator back to the beginning* > > while (columnIterator.hasNext()) { > Map.Entry kv = columnIterator.next(); > } > > } > --f46d04448157af2b4804d6f77a24 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <mike@piragua.com>= wrote:
Is there a way to "reset" the column iterator ba= ck to the "beginning" when using the AccumuloRowInputFormat? =A0W= e have a case in which we need to iterate over the columns for a row at lea= st twice and it could be a large row that may not fit in memory.

I think we can work around this by having a separate scanner= used within the map method for this purpose. =A0Other than that, is there = a way to clone or copy or reset the column iterator such that we can iterat= e over it more than once?

Currently, no.=A0 It's not immediately obvi= ous how we could change the InputFormat to accomplish this.=A0 The RecordRe= ader creates a scanner, does the seeking/fetching for the InputSplit once i= n its initialize method, then iterates over the scanner, grouping together = rows as appropriate.=A0 Going back to the beginning of a row would require = us to seek the scanner again, and replace the old iterator with a new one.= =A0 We could make a special RecordReader with a reset method, but I don'= ;t know how we could call the method.=A0 Interactions with the RecordReader= are handled by the MapContext, and I don't know if you can use a custo= m MapContext.=A0 Maybe we could have an InputFormat that gives you a Scanne= r directly that you could reseek in the Mapper, but we'd have to spend = some time thinking about it to make sure it would work.

Billie



Thanks,

Mike

public void map(Text key, PeekingIterator<Map.Entry<Key,= Value>> columnIterator, Context context) {
=A0 =A0 while (columnIterator.hasNext()) {
=A0 =A0 =A0 =A0 M= ap.Entry<Key, Value> kv =3D columnIterator.next();
=A0 =A0 = }
=A0 =A0=A0
=A0 =A0 // reset column iterator ba= ck to the beginning
=A0 =A0=A0
=A0 =A0 while (columnIterator.hasNext()) {
<= div>=A0 =A0 =A0 =A0 Map.Entry<Key, Value> kv =3D columnIterator.next(= );
=A0 =A0 }
=A0 =A0=A0
}

--f46d04448157af2b4804d6f77a24--