accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cardon, Tejay E" <tejay.e.car...@lmco.com>
Subject RE: EXTERNAL: Re: Custom Iterators
Date Thu, 23 Aug 2012 15:30:51 GMT
Ah, thank you Marc.  I should have pieced that together, but hadn't.  So the OrIterator should
never be added directly to the stack using the scanner.  It is utilized by something like
the BooleanLogicIterator to build a composite iterator, but it is the BooleanLogicIterator
that actually gets added to the stack.

-----Original Message-----
From: Marc P. [mailto:marc.parisi@gmail.com] 
Sent: Thursday, August 23, 2012 9:15 AM
To: user@accumulo.apache.org
Subject: Re: EXTERNAL: Re: Custom Iterators

Thanks for catching that! I did indeed write that down incorrectly. I apologize. I'll fix
that tonight.

Iterators are stacked based on their priority ( when you set them via the scanner, for example
) or the input format's IteratorSetting.

The init method comment is a general suggestion, for example if you are using it within a
scan session.

The OrIterator ( as in the wikisearch example ) is created by the BooleanLogicIterator, and
the sources are added ( through the addTerm method). This is, apparently, it's expected use.
You will also note that the BooleanLogicIterator ( or any iter that uses the OrIterator )
has an implemented initializer method.

On Thu, Aug 23, 2012 at 10:59 AM, Cardon, Tejay E <tejay.e.cardon@lmco.com> wrote:
> Marc,
>
> Thanks for the writeup.  It is by far the most comprehensive info I've 
> seen on iterators, and was very helpful to me.  A couple notes/questions:
>
>
>
> You mention that SortedKeyValueIterator implements FileSKVIterator.  
> I've only looked at the 1.4.1 source, but it appears that the opposite is true.
>
>
>
> You also mention that iterators get their source from the init method, 
> but some (like OrIterator) seem to throw exceptions on that method.  
> Where do they get their source data, and what are the API implications 
> of having iterators that reject init (or deep copy for that matter).
>
>
>
> Final thought.  If I want to stack several iterators, what's the best 
> way to go about that?  In other words, I'd like an iterator that I 
> write to be the source to another iterator that I've written, which in 
> turn may feed yet another that I've written.  Preferably, I'd like 
> each to be independently re-useable, so I don't want to build that 
> stacking into the source of any of the iterators themselves.  Is that 
> possible, or would I need some sort of iterator factory that builds 
> the stacks and then acts as an interface to the fully formed stack?
>
>
>
> Thanks,
>
> Tejay
>
> From: Marc Parisi [mailto:marc@accumulo.net]
> Sent: Wednesday, August 22, 2012 5:33 PM
>
>
> To: user@accumulo.apache.org
> Subject: EXTERNAL: Re: Custom Iterators
>
>
>
> Here's a quick write up
>
>
>
>     http://www.accumulo.net/node/1
>
> On Wed, Aug 22, 2012 at 8:03 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
> Err, double (triple) reply:
>
> No, you are incorrect. The wikisearch example can handle any arbitrary 
> boolean expression containing NOT, AND, and OR. As always, I'll 
> preface it the same as Bill did: it *should* be able to handle them :).
>
> I know that cleaning-up/reworking the Wikisearch code is in the works. 
> I'm just not positive about the timeframe.
>
> As far as examples, I'd push you to the write-up Eric did after 
> benchmarking the wikisearch example: 
> http://accumulo.apache.org/example/wikisearch.html
>
> He has some example queries that give the basic idea behind what's 
> supported (minus the NOTs)
>
> On 08/22/2012 05:27 PM, Cardon, Tejay E wrote:
>
>
> Josh,
>
> Thanks for getting back to me so quickly. I explained in my lengthy 
> reply to William that the comment on OrIterator.TermSource.compareTo 
> indicates that implementations with more than one row per tablet need 
> to compare row key first (and that is not being done in this code). It 
> may be that it's not an issue and I'm simply misunderstanding 
> something. As for the wikisearch example, as I understood it, it could only handle searches
for "anded"
> terms. If that's not the case, then an example of an or search would 
> be helpful. In any case, I'd love a deeper dive on the wikisearch 
> somewhere. I get the source code and a high level explanation of 
> what's happening, but I'd love a tutorial or something that walks 
> through the classes and explains how each one contributes to the 
> functionality. Don't consider that a request (that would be a lot more 
> to ask then I'm willing to ask), but I would certainly find it useful if it does exist.
>
> Thanks,
>
> Tejay
>
> *From:*Josh Elser [mailto:josh.elser@gmail.com]
> *Sent:* Wednesday, August 22, 2012 2:53 PM
> *To:* user@accumulo.apache.org
> *Subject:* EXTERNAL: Re: Custom Iterators
>
>
>
> What makes you say that the OrIterator cannot handle more than one row 
> per tablet? Can you provide details?
>
> AFAIK, the OrIterator should work correctly in all cases (e.g. 
> regardless of row distribution in a tablet). Any issues in the code 
> that prevent it from doing so would be a bug that should be fixed.
>
> Also, the wikisearch example supports indexing over multiple 
> attributes (and I believe indexes document metadata in addition to the tokenized document).
> Is there something unclear that could be better documented?
>
> On 8/22/12 4:41 PM, Cardon, Tejay E wrote:
>
>     All,
>
>     I'm interested in writing a custom iterator, and I've been looking
>     for documentation on how to do so. Thus far, I've not been able to
>     find anything beyond the java docs in SortedKeyValueIterator and a
>     few other sub-classes. A few of the examples use Iterators, but
>     provide no real info on how to properly implement one. Is there
>     anywhere to find general guidance on the iterator stack?
>
>     (If you're interested)
>
>     Specifically, for those that are curious, I'm trying to implement
>     something similar to the wikisearch example, but with some key
>     differences. In my case, I've got a file with various attributes
>     that being indexed. So for each file there are 5 attributes, and
>     each attribute has a fixed number of possible values. For example
>     (totally made up):
>
>     personID, gender, hair color, country, race, personRecord
>
>     Row:binID; ColFam:Attribute_AttributeValue; ColQ:PersonID; 
> Val:blank
>
>     AND
>     Row:binID; ColFam:"D"; ColQ:personID; value:personRecord
>
>     A typical query would be:
>
>     Give me the personRecord for all people with:
>
>     Gender: male &
>
>     Hair color: blond or brown &
>
>     Country: USA or England or china or korea &
>
>     Race: white or oriental
>
>     The existing Iterators used in the wikisearch example are unable
>     to handle the "or" clauses in each attribute.
>
>     The OrIterator doesn't appear to handle the possibility more than
>     one row per tablet
>
>     Thanks,
>
>     Tejay Cardon
>
>

Mime
View raw message