accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Custom Iterators
Date Thu, 23 Aug 2012 00:03:37 GMT
Err, double (triple) reply:

No, you are incorrect. The wikisearch example can handle any arbitrary 
boolean expression containing NOT, AND, and OR. As always, I'll preface 
it the same as Bill did: it *should* be able to handle them :).

I know that cleaning-up/reworking the Wikisearch code is in the works. 
I'm just not positive about the timeframe.

As far as examples, I'd push you to the write-up Eric did after 
benchmarking the wikisearch example: 
http://accumulo.apache.org/example/wikisearch.html

He has some example queries that give the basic idea behind what's 
supported (minus the NOTs)

On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>
> Josh,
>
> Thanks for getting back to me so quickly. I explained in my lengthy 
> reply to William that the comment on OrIterator.TermSource.compareTo 
> indicates that implementations with more than one row per tablet need 
> to compare row key first (and that is not being done in this code). It 
> may be that it’s not an issue and I’m simply misunderstanding 
> something. As for the wikisearch example, as I understood it, it could 
> only handle searches for “anded” terms. If that’s not the case, then 
> an example of an or search would be helpful. In any case, I’d love a 
> deeper dive on the wikisearch somewhere. I get the source code and a 
> high level explanation of what’s happening, but I’d love a tutorial or 
> something that walks through the classes and explains how each one 
> contributes to the functionality. Don’t consider that a request (that 
> would be a lot more to ask then I’m willing to ask), but I would 
> certainly find it useful if it does exist.
>
> Thanks,
>
> Tejay
>
> *From:*Josh Elser [mailto:josh.elser@gmail.com]
> *Sent:* Wednesday, August 22, 2012 2:53 PM
> *To:* user@accumulo.apache.org
> *Subject:* EXTERNAL: Re: Custom Iterators
>
> What makes you say that the OrIterator cannot handle more than one row 
> per tablet? Can you provide details?
>
> AFAIK, the OrIterator should work correctly in all cases (e.g. 
> regardless of row distribution in a tablet). Any issues in the code 
> that prevent it from doing so would be a bug that should be fixed.
>
> Also, the wikisearch example supports indexing over multiple 
> attributes (and I believe indexes document metadata in addition to the 
> tokenized document). Is there something unclear that could be better 
> documented?
>
> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>
>     All,
>
>     I’m interested in writing a custom iterator, and I’ve been looking
>     for documentation on how to do so. Thus far, I’ve not been able to
>     find anything beyond the java docs in SortedKeyValueIterator and a
>     few other sub-classes. A few of the examples use Iterators, but
>     provide no real info on how to properly implement one. Is there
>     anywhere to find general guidance on the iterator stack?
>
>     (If you’re interested)
>
>     Specifically, for those that are curious, I’m trying to implement
>     something similar to the wikisearch example, but with some key
>     differences. In my case, I’ve got a file with various attributes
>     that being indexed. So for each file there are 5 attributes, and
>     each attribute has a fixed number of possible values. For example
>     (totally made up):
>
>     personID, gender, hair color, country, race, personRecord
>
>     Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; Val:blank
>
>     AND
>     Row:binID; ColFam:”D”; ColQ:personID; value:personRecord
>
>     A typical query would be:
>
>     Give me the personRecord for all people with:
>
>     Gender: male &
>
>     Hair color: blond or brown &
>
>     Country: USA or England or china or korea &
>
>     Race: white or oriental
>
>     The existing Iterators used in the wikisearch example are unable
>     to handle the “or” clauses in each attribute.
>
>     The OrIterator doesn’t appear to handle the possibility more than
>     one row per tablet
>
>     Thanks,
>
>     Tejay Cardon
>

Mime
View raw message