mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Interesting iterator issue
Date Mon, 06 Sep 2010 10:02:22 GMT
I'm going for #2. My reasoning is that the current Iterator's point is
to be reusable, but is not reusable at the moment. It is only used in
one place, and "inlining" its logic results in a correct and more
efficient implementation than is possible by patching up the Iterator,
since it needs more particular behavior than a general Iterator can
provide here.

Should we want an Iterator over key-value pairs in a SequenceFile --
and that's interesting -- it can be rewritten (by me if you like) in a
more correct way later.

On Sun, Sep 5, 2010 at 7:59 PM, Sean Owen <> wrote:
> Having an interesting issue fixing up a technical point in the class
> SequenceFileVectorIterable. Its Iterator is wrong in that hasNext()
> advances the iteration and next() doesn't. There's a way to fix this
> easily: the Iterator just needs to always read one item ahead to know
> whether a next one exists.
> However doing this the straightforward way, the Iterator doesn't know
> the current key/value it's on -- always the next one. This is an issue
> since in the one usage of this class, in VectorDumper, the current key
> is accessed for printing.
> The Iterator could just store the last key/value it saw. However, the
> key can be an arbitrary Writable. To do this correctly, the Writable
> would have to be Cloneable and be clone()-ed, which is not guaranteed
> and maybe undesirable.
> 1. We can remove the option in VectorDumper to print keys to fix this,
> since that's the only thing that wants to read the current key. How
> bad is that?
> 2. We can iterate directly over the SequenceFile in VectorDumper to
> get desired behavior. OK? Then actually SequenceFileVectorIterable
> goes away.
> Sean

View raw message