Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAGFNOZSiY-jYoMkDeMrs=R8U===3P1SRsgrUDUgp92Gn5B=jQw@mail.gmail.com>
References: 
 <CAGFNOZSiY-jYoMkDeMrs=R8U===3P1SRsgrUDUgp92Gn5B=jQw@mail.gmail.com>
Date: Sat, 2 Mar 2013 13:05:10 -0800
Message-ID: 
 <CAF1jEfD_8ZfyN4_6k7pfZyET_Y_b_KiWV-TGJ0LGP8HNmdww7g@mail.gmail.com>
Subject: Re: Reset column iterator while using AccumuloRowInputFormat
From: Billie Rinaldi <billie@apache.org>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=f46d04448157af2b4804d6f77a24

--f46d04448157af2b4804d6f77a24
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <mike@piragua.com> wrote:

> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat?  We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
>
> I think we can work around this by having a separate scanner used within
> the map method for this purpose.  Other than that, is there a way to clone
> or copy or reset the column iterator such that we can iterate over it more
> than once?
>

Currently, no.  It's not immediately obvious how we could change the
InputFormat to accomplish this.  The RecordReader creates a scanner, does
the seeking/fetching for the InputSplit once in its initialize method, then
iterates over the scanner, grouping together rows as appropriate.  Going
back to the beginning of a row would require us to seek the scanner again,
and replace the old iterator with a new one.  We could make a special
RecordReader with a reset method, but I don't know how we could call the
method.  Interactions with the RecordReader are handled by the MapContext,
and I don't know if you can use a custom MapContext.  Maybe we could have
an InputFormat that gives you a Scanner directly that you could reseek in
the Mapper, but we'd have to spend some time thinking about it to make sure
it would work.

Billie


> Thanks,
>
> Mike
>
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> *    // reset column iterator back to the beginning*
>
>     while (columnIterator.hasNext()) {
>         Map.Entry<Key, Value> kv = columnIterator.next();
>     }
>
> }
>

--f46d04448157af2b4804d6f77a24
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <span dir=3D"ltr">&lt;<a href=3D=
"mailto:mike@piragua.com" target=3D"_blank">mike@piragua.com</a>&gt;</span>=
 wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr">Is there a way to &quot;reset&quot; the column iterator ba=
ck to the &quot;beginning&quot; when using the AccumuloRowInputFormat? =A0W=
e have a case in which we need to iterate over the columns for a row at lea=
st twice and it could be a large row that may not fit in memory.<div>


<br></div><div>I think we can work around this by having a separate scanner=
 used within the map method for this purpose. =A0Other than that, is there =
a way to clone or copy or reset the column iterator such that we can iterat=
e over it more than once?</div>
</div></blockquote><div><br>Currently, no.=A0 It&#39;s not immediately obvi=
ous how we could change the InputFormat to accomplish this.=A0 The RecordRe=
ader creates a scanner, does the seeking/fetching for the InputSplit once i=
n its initialize method, then iterates over the scanner, grouping together =
rows as appropriate.=A0 Going back to the beginning of a row would require =
us to seek the scanner again, and replace the old iterator with a new one.=
=A0 We could make a special RecordReader with a reset method, but I don&#39=
;t know how we could call the method.=A0 Interactions with the RecordReader=
 are handled by the MapContext, and I don&#39;t know if you can use a custo=
m MapContext.=A0 Maybe we could have an InputFormat that gives you a Scanne=
r directly that you could reseek in the Mapper, but we&#39;d have to spend =
some time thinking about it to make sure it would work.<br>
<br>Billie<br><br><br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"=
>

<div><br></div><div>Thanks,</div><div><br></div><div>Mike</div><div><br></d=
iv><div><div>public void map(Text key, PeekingIterator&lt;Map.Entry&lt;Key,=
 Value&gt;&gt; columnIterator, Context context) {</div>

<div>=A0 =A0 while (columnIterator.hasNext()) {</div><div>=A0 =A0 =A0 =A0 M=
ap.Entry&lt;Key, Value&gt; kv =3D columnIterator.next();</div><div>=A0 =A0 =
}<br></div><div>=A0 =A0=A0</div><div><b>=A0 =A0 // reset column iterator ba=
ck to the beginning</b></div>


<div>=A0 =A0=A0</div><div>=A0 =A0 while (columnIterator.hasNext()) {</div><=
div>=A0 =A0 =A0 =A0 Map.Entry&lt;Key, Value&gt; kv =3D columnIterator.next(=
);</div><div>=A0 =A0 }<br></div><div>=A0 =A0=A0</div><div>}</div></div></di=
v>
</blockquote></div><br>

--f46d04448157af2b4804d6f77a24--