accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Parisi <>
Subject Re: Custom Iterators
Date Wed, 22 Aug 2012 23:32:45 GMT
Here's a quick write up <>

On Wed, Aug 22, 2012 at 8:03 PM, Josh Elser <> wrote:

> Err, double (triple) reply:
> No, you are incorrect. The wikisearch example can handle any arbitrary
> boolean expression containing NOT, AND, and OR. As always, I'll preface it
> the same as Bill did: it *should* be able to handle them :).
> I know that cleaning-up/reworking the Wikisearch code is in the works. I'm
> just not positive about the timeframe.
> As far as examples, I'd push you to the write-up Eric did after
> benchmarking the wikisearch example:**
> example/wikisearch.html<>
> He has some example queries that give the basic idea behind what's
> supported (minus the NOTs)
> On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>> Josh,
>> Thanks for getting back to me so quickly. I explained in my lengthy reply
>> to William that the comment on OrIterator.TermSource.**compareTo
>> indicates that implementations with more than one row per tablet need to
>> compare row key first (and that is not being done in this code). It may be
>> that it’s not an issue and I’m simply misunderstanding something. As for
>> the wikisearch example, as I understood it, it could only handle searches
>> for “anded” terms. If that’s not the case, then an example of an or search
>> would be helpful. In any case, I’d love a deeper dive on the wikisearch
>> somewhere. I get the source code and a high level explanation of what’s
>> happening, but I’d love a tutorial or something that walks through the
>> classes and explains how each one contributes to the functionality. Don’t
>> consider that a request (that would be a lot more to ask then I’m willing
>> to ask), but I would certainly find it useful if it does exist.
>> Thanks,
>> Tejay
>> *From:*Josh Elser []
>> *Sent:* Wednesday, August 22, 2012 2:53 PM
>> *To:*
>> *Subject:* EXTERNAL: Re: Custom Iterators
>> What makes you say that the OrIterator cannot handle more than one row
>> per tablet? Can you provide details?
>> AFAIK, the OrIterator should work correctly in all cases (e.g. regardless
>> of row distribution in a tablet). Any issues in the code that prevent it
>> from doing so would be a bug that should be fixed.
>> Also, the wikisearch example supports indexing over multiple attributes
>> (and I believe indexes document metadata in addition to the tokenized
>> document). Is there something unclear that could be better documented?
>> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>>     All,
>>     I’m interested in writing a custom iterator, and I’ve been looking
>>     for documentation on how to do so. Thus far, I’ve not been able to
>>     find anything beyond the java docs in SortedKeyValueIterator and a
>>     few other sub-classes. A few of the examples use Iterators, but
>>     provide no real info on how to properly implement one. Is there
>>     anywhere to find general guidance on the iterator stack?
>>     (If you’re interested)
>>     Specifically, for those that are curious, I’m trying to implement
>>     something similar to the wikisearch example, but with some key
>>     differences. In my case, I’ve got a file with various attributes
>>     that being indexed. So for each file there are 5 attributes, and
>>     each attribute has a fixed number of possible values. For example
>>     (totally made up):
>>     personID, gender, hair color, country, race, personRecord
>>     Row:binID; ColFam:Attribute_**AttributeValue; ColQ:PersonID;
>> Val:blank
>>     AND
>>     Row:binID; ColFam:”D”; ColQ:personID; value:personRecord
>>     A typical query would be:
>>     Give me the personRecord for all people with:
>>     Gender: male &
>>     Hair color: blond or brown &
>>     Country: USA or England or china or korea &
>>     Race: white or oriental
>>     The existing Iterators used in the wikisearch example are unable
>>     to handle the “or” clauses in each attribute.
>>     The OrIterator doesn’t appear to handle the possibility more than
>>     one row per tablet
>>     Thanks,
>>     Tejay Cardon

View raw message