accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Kepner <kep...@ll.mit.edu>
Subject Re: is there any "trick" to save the state of an iterator?
Date Tue, 10 Jan 2017 01:23:51 GMT
It's done in D4M (d4m.mit.edu), you might look there.
Dylan can explain (if necessary).
Regards.  -Jeremy

On Mon, Jan 09, 2017 at 07:30:03PM -0500, Josh Elser wrote:
> Great. Glad I wasn't derailing things :)
> 
> Unfortunately, I don't think this is a very well-documented area of the
> code (it's quite advanced and would just confuse most users).
> 
> I'll have to think about it some more and see if I can come up with
> anything clever. I know there are some others subscribed to this list
> who might be more clever than I am -- I'm sure they'll weigh in if they
> have any suggestions.
> 
> Finally, if you're interested in helping us put together some sort of
> "advanced indexing" docs for the project, I'm sure we could find a few
> people who would be happy to get something published on the Accumulo
> website.
> 
> Massimilian Mattetti wrote:
> > Thank you for your answer John, you understood perfectly what my use 
> > case is.
> > 
> > The possible solutions that you propose came to mind to me, too. This 
> > confirms to me that, unfortunately, there is no fancy way to overcome 
> > this problem.
> > 
> > Is there any good documentation on different query planning for Accumulo 
> > that could help with my use case?
> > Thanks.
> > 
> > Regards,
> > Max
> > 
> > 
> > 
> > 
> > From: Josh Elser <josh.elser@gmail.com>
> > To: user@accumulo.apache.org
> > Date: 09/01/2017 21:55
> > Subject: Re: is there any "trick" to save the state of an iterator?
> > ------------------------------------------------------------------------
> > 
> > 
> > 
> > Hey Max,
> > 
> > There is no provided mechanism to do this, and this is a problem with
> > supporting "range queries". I'm hoping I'm understanding your use-case
> > correctly; sorry in advance if I'm going off on a tangent.
> > 
> > When performing the standard sort-merge join across some columns to
> > implement intersections and unions, the un-sorted range of values you
> > want to scan over (500k-600k) breaks the ordering of the docIds which
> > you are trying to catch.
> > 
> > The trivial solution is to convert a range into a union of discrete
> > values (500000 || 500001 || 500002 || ..) but you can see how this
> > quickly falls apart. An inverted index could be used to enumerate the
> > values that exist in the range.
> > 
> > Another trivial solution would be to select all records matching the
> > smaller condition, and then post-filter the other condition.
> > 
> > There might be some more trickier query planning decisions you could
> > also experiment with (I'd have to give it lots more thought). In short,
> > I'd recommend against trying to solve the problem via saving state.
> > Architecturally, this is just not something that Accumulo Iterators are
> > designed to support at this time.
> > 
> > - Josh
> > 
> > Massimilian Mattetti wrote:
> >  > Hi all,
> >  >
> >  > I am working with a Document-Partitioned Index table whose index
> >  > sections are accessed using ranges over the indexed properties (e.g.
> >  > property A ∈ [500,000 - 600,000], property B ∈ [0.1 - 0.4], etc.). The
> >  > iterator that handles this table works by: 1st - calculating (doing
> >  > intersection and union on different properties) all the result from the
> >  > index section of a single bin; 2nd - using the ids retrieved from the
> >  > index, it goes over the data section of the specific bin.
> >  > This iterator has proved to have significant performance penalty
> >  > whenever the amount of data retrieved from the index is orders of
> >  > magnitude bigger than the table_scan_max_memory i.e. the iterator is
> >  > teardown tens of times for each bin. Since there is no explicit way to
> >  > save the state of an iterator, is there any other mechanism/approach
> >  > that I could use/follow in order to avoid to re-calculate the index
> >  > result set after each teardown?
> >  > Thanks.
> >  >
> >  >
> >  > Regards,
> >  > Max
> >  >
> > .
> > 
> > 
> > 
> > 

Mime
View raw message