accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sukant Hajra" <>
Subject Re: strategies beyond intersecting iterators?
Date Fri, 29 Jun 2012 21:27:34 GMT
Excerpts from William Slacum's message of Thu Jun 28 16:04:32 -0500 2012:
> You're pretty much on the spot regarding two aspects about the current
> IntersectingIterator:
> 1- It's not really extensible (there are hooks for building doc IDs,
> but you still need the same `partition term: docId` key structure)
> 2- Its main strength is that it can do the merges of sorted lists of
> doc IDs based on equality expressions (ie, `author=="bob" and
> day=="20120627"`)
> Fortunately, the logic isn't very complicated for re-creating the
> merging stuff. Personally, I think it's easy enough to separate the
> logic of joining N streams of iterator results from the actual
> scanning. Unfortunately, this would be left up to you to do at the
> moment :)
> You could do range searches by consuming sets of values and sorting
> all of the docIds in that range by throwing them into a TreeSet. That
> would let you emit doc IDs in a globally sorted order for the given
> range of terms.

I understand everything above, I think.  Thanks for the prompt reply.

> This can get problematic if the range ends up being very large because your
> iterator stack may periodically be destroyed and rebuilt.

This particular statement confused me.  When you said TreeSet, you're talking
about a straight-forward in-memory collection from java.util or similar, right?

Because I'm confused about which "iterator stack may periodically be destroyed
and rebuilt."  It sounds like we're talking about some garbage collection
specific to Accumulo.  Am I missing something here?


View raw message