accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cardon, Tejay E" <tejay.e.car...@lmco.com>
Subject RE: EXTERNAL: Re: Custom Iterators
Date Wed, 22 Aug 2012 22:27:31 GMT
Josh,
Thanks for getting back to me so quickly.  I explained in my lengthy reply to William that
the comment on OrIterator.TermSource.compareTo indicates that implementations with more than
one row per tablet need to compare row key first (and that is not being done in this code).
 It may be that it's not an issue and I'm simply misunderstanding something.  As for the wikisearch
example, as I understood it, it could only handle searches for "anded" terms.  If that's not
the case, then an example of an or search would be helpful.  In any case, I'd love a deeper
dive on the wikisearch somewhere.  I get the source code and a high level explanation of what's
happening, but I'd love a tutorial or something that walks through the classes and explains
how each one contributes to the functionality.  Don't consider that a request (that would
be a lot more to ask then I'm willing to ask), but I would certainly find it useful if it
does exist.

Thanks,
Tejay

From: Josh Elser [mailto:josh.elser@gmail.com]
Sent: Wednesday, August 22, 2012 2:53 PM
To: user@accumulo.apache.org
Subject: EXTERNAL: Re: Custom Iterators

What makes you say that the OrIterator cannot handle more than one row per tablet? Can you
provide details?

AFAIK, the OrIterator should work correctly in all cases (e.g. regardless of row distribution
in a tablet). Any issues in the code that prevent it from doing so would be a bug that should
be fixed.

Also, the wikisearch example supports indexing over multiple attributes (and I believe indexes
document metadata in addition to the tokenized document). Is there something unclear that
could be better documented?
On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
All,
I'm interested in writing a custom iterator, and I've been looking for documentation on how
to do so.  Thus far, I've not been able to find anything beyond the java docs in SortedKeyValueIterator
and a few other sub-classes.  A few of the examples use Iterators, but provide no real info
on how to properly implement one.  Is there anywhere to find general guidance on the iterator
stack?

(If you're interested)
Specifically, for those that are curious, I'm trying to implement something similar to the
wikisearch example, but with some key differences.  In my case, I've got a file with various
attributes that being indexed.  So for each file there are 5 attributes, and each attribute
has a fixed number of possible values.  For example (totally made up):
personID, gender, hair color, country, race, personRecord

Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
AND
Row:binID; ColFam:"D"; ColQ:personID; value:personRecord

A typical query would be:
Give me the personRecord for all people with:
Gender: male &
Hair color: blond or brown &
Country: USA or England or china or korea &
Race: white or oriental

The existing Iterators used in the wikisearch example are unable to handle the "or" clauses
in each attribute.
The OrIterator doesn't appear to handle the possibility more than one row per tablet

Thanks,
Tejay Cardon


Mime
View raw message