accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc P." <marc.par...@gmail.com>
Subject Re: EXTERNAL: Re: Custom Iterators
Date Thu, 23 Aug 2012 15:14:51 GMT
Thanks for catching that! I did indeed write that down incorrectly. I
apologize. I'll fix that tonight.

Iterators are stacked based on their priority ( when you set them via
the scanner, for example ) or the input format's IteratorSetting.

The init method comment is a general suggestion, for example if you
are using it within a scan session.

The OrIterator ( as in the wikisearch example ) is created by the
BooleanLogicIterator, and the sources are added ( through the addTerm
method). This is, apparently, it's expected use. You will also note
that the BooleanLogicIterator ( or any iter that uses the OrIterator )
has an implemented initializer method.

On Thu, Aug 23, 2012 at 10:59 AM, Cardon, Tejay E
<tejay.e.cardon@lmco.com> wrote:
> Marc,
>
> Thanks for the writeup.  It is by far the most comprehensive info I’ve seen
> on iterators, and was very helpful to me.  A couple notes/questions:
>
>
>
> You mention that SortedKeyValueIterator implements FileSKVIterator.  I’ve
> only looked at the 1.4.1 source, but it appears that the opposite is true.
>
>
>
> You also mention that iterators get their source from the init method, but
> some (like OrIterator) seem to throw exceptions on that method.  Where do
> they get their source data, and what are the API implications of having
> iterators that reject init (or deep copy for that matter).
>
>
>
> Final thought.  If I want to stack several iterators, what’s the best way to
> go about that?  In other words, I’d like an iterator that I write to be the
> source to another iterator that I’ve written, which in turn may feed yet
> another that I’ve written.  Preferably, I’d like each to be independently
> re-useable, so I don’t want to build that stacking into the source of any of
> the iterators themselves.  Is that possible, or would I need some sort of
> iterator factory that builds the stacks and then acts as an interface to the
> fully formed stack?
>
>
>
> Thanks,
>
> Tejay
>
> From: Marc Parisi [mailto:marc@accumulo.net]
> Sent: Wednesday, August 22, 2012 5:33 PM
>
>
> To: user@accumulo.apache.org
> Subject: EXTERNAL: Re: Custom Iterators
>
>
>
> Here's a quick write up
>
>
>
>     http://www.accumulo.net/node/1
>
> On Wed, Aug 22, 2012 at 8:03 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
> Err, double (triple) reply:
>
> No, you are incorrect. The wikisearch example can handle any arbitrary
> boolean expression containing NOT, AND, and OR. As always, I'll preface it
> the same as Bill did: it *should* be able to handle them :).
>
> I know that cleaning-up/reworking the Wikisearch code is in the works. I'm
> just not positive about the timeframe.
>
> As far as examples, I'd push you to the write-up Eric did after benchmarking
> the wikisearch example: http://accumulo.apache.org/example/wikisearch.html
>
> He has some example queries that give the basic idea behind what's supported
> (minus the NOTs)
>
> On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>
>
> Josh,
>
> Thanks for getting back to me so quickly. I explained in my lengthy reply to
> William that the comment on OrIterator.TermSource.compareTo indicates that
> implementations with more than one row per tablet need to compare row key
> first (and that is not being done in this code). It may be that it’s not an
> issue and I’m simply misunderstanding something. As for the wikisearch
> example, as I understood it, it could only handle searches for “anded”
> terms. If that’s not the case, then an example of an or search would be
> helpful. In any case, I’d love a deeper dive on the wikisearch somewhere. I
> get the source code and a high level explanation of what’s happening, but
> I’d love a tutorial or something that walks through the classes and explains
> how each one contributes to the functionality. Don’t consider that a request
> (that would be a lot more to ask then I’m willing to ask), but I would
> certainly find it useful if it does exist.
>
> Thanks,
>
> Tejay
>
> *From:*Josh Elser [mailto:josh.elser@gmail.com]
> *Sent:* Wednesday, August 22, 2012 2:53 PM
> *To:* user@accumulo.apache.org
> *Subject:* EXTERNAL: Re: Custom Iterators
>
>
>
> What makes you say that the OrIterator cannot handle more than one row per
> tablet? Can you provide details?
>
> AFAIK, the OrIterator should work correctly in all cases (e.g. regardless of
> row distribution in a tablet). Any issues in the code that prevent it from
> doing so would be a bug that should be fixed.
>
> Also, the wikisearch example supports indexing over multiple attributes (and
> I believe indexes document metadata in addition to the tokenized document).
> Is there something unclear that could be better documented?
>
> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>
>     All,
>
>     I’m interested in writing a custom iterator, and I’ve been looking
>     for documentation on how to do so. Thus far, I’ve not been able to
>     find anything beyond the java docs in SortedKeyValueIterator and a
>     few other sub-classes. A few of the examples use Iterators, but
>     provide no real info on how to properly implement one. Is there
>     anywhere to find general guidance on the iterator stack?
>
>     (If you’re interested)
>
>     Specifically, for those that are curious, I’m trying to implement
>     something similar to the wikisearch example, but with some key
>     differences. In my case, I’ve got a file with various attributes
>     that being indexed. So for each file there are 5 attributes, and
>     each attribute has a fixed number of possible values. For example
>     (totally made up):
>
>     personID, gender, hair color, country, race, personRecord
>
>     Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
>
>     AND
>     Row:binID; ColFam:”D”; ColQ:personID; value:personRecord
>
>     A typical query would be:
>
>     Give me the personRecord for all people with:
>
>     Gender: male &
>
>     Hair color: blond or brown &
>
>     Country: USA or England or china or korea &
>
>     Race: white or oriental
>
>     The existing Iterators used in the wikisearch example are unable
>     to handle the “or” clauses in each attribute.
>
>     The OrIterator doesn’t appear to handle the possibility more than
>     one row per tablet
>
>     Thanks,
>
>     Tejay Cardon
>
>

Mime
View raw message