accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <bil...@apache.org>
Subject Re: TimeSpan Iterator
Date Tue, 28 Aug 2012 18:04:08 GMT
On Tue, Aug 28, 2012 at 9:51 AM, <Bob.Thorman@l-3com.com> wrote:

> Billie****
>
> ** **
>
> Your comment “Users should be aware that this is not an efficient
> operation, though.” may help me decide if my current use of a secondary
> time index is better then.  Right now I maintain a table that has
> timestamps as the rowid whose values are the rowid in a metadata table.
> Therefore I do one range scan based on the timestamp.  Then a second lookup
> of the metadata rowid.  Is this more efficient?
>

It probably depends on what percentage of the data you're bringing back, as
compared to the amount you're scanning over (if that's not the whole
table).  I would hypothesize if you're bringing more than N% of the data
back, you might as well just use the TimestampFilter on the main table.  If
you're bringing a smaller percentage back, it could be better to reduce the
amount of the main table you have to scan over by maintaining a secondary
time index.  I'm not sure what N would be.  You should also make sure that
the secondary index is actually reducing the amount of the main table
you're scanning over, e.g. if each rowid had a full range of timestamps,
you could be pulling a list of all rowids back from the index table and not
reducing the scan over the main table.

Also, the TimestampFilter is not optimized.  Filters evaluate each
key/value pair to see if it is accepted (in this case, if it is in a
timestamp range).  If there are a lot of timestamps for each cell (keys
that are identical except for timestamp), it would be better to use seeking
instead.  That would involve writing a new iterator.  If there aren't many
timestamps for each cell, seeking won't help and the TimestampFilter will
be fine.

Billie



> ** **
>
> *From:* Billie Rinaldi [mailto:billie@apache.org]
> *Sent:* Tuesday, August 28, 2012 11:46
>
> *To:* user@accumulo.apache.org; john.armstrong@ccri.com
> *Subject:* Re: TimeSpan Iterator****
>
> ** **
>
> On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <jrja@ccri.com> wrote:****
>
> On 08/28/2012 09:26 AM, Bob.Thorman@l-3com.com wrote:****
>
> Does anyone know of a TimeSpan Iterator that will fetch rows based on
> the accumulo timestamp?****
>
> ** **
>
> We actually wrote our own TimestampRangeIterator and TimestampSetIterator
> classes.  I don't know if 1.4 has any in the core libraries.  It's not very
> hard though.****
>
>
> There's a TimestampFilter in org.apache.accumulo.core.iterators.user in
> 1.4.  It uses a range of timestamps.  Users should be aware that this is
> not an efficient operation, though.
>
> Billie****
>

Mime
View raw message