accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <>
Subject Re: TimeSpan Iterator
Date Tue, 28 Aug 2012 18:04:08 GMT
On Tue, Aug 28, 2012 at 9:51 AM, <> wrote:

> Billie****
> ** **
> Your comment “Users should be aware that this is not an efficient
> operation, though.” may help me decide if my current use of a secondary
> time index is better then.  Right now I maintain a table that has
> timestamps as the rowid whose values are the rowid in a metadata table.
> Therefore I do one range scan based on the timestamp.  Then a second lookup
> of the metadata rowid.  Is this more efficient?

It probably depends on what percentage of the data you're bringing back, as
compared to the amount you're scanning over (if that's not the whole
table).  I would hypothesize if you're bringing more than N% of the data
back, you might as well just use the TimestampFilter on the main table.  If
you're bringing a smaller percentage back, it could be better to reduce the
amount of the main table you have to scan over by maintaining a secondary
time index.  I'm not sure what N would be.  You should also make sure that
the secondary index is actually reducing the amount of the main table
you're scanning over, e.g. if each rowid had a full range of timestamps,
you could be pulling a list of all rowids back from the index table and not
reducing the scan over the main table.

Also, the TimestampFilter is not optimized.  Filters evaluate each
key/value pair to see if it is accepted (in this case, if it is in a
timestamp range).  If there are a lot of timestamps for each cell (keys
that are identical except for timestamp), it would be better to use seeking
instead.  That would involve writing a new iterator.  If there aren't many
timestamps for each cell, seeking won't help and the TimestampFilter will
be fine.


> ** **
> *From:* Billie Rinaldi []
> *Sent:* Tuesday, August 28, 2012 11:46
> *To:*;
> *Subject:* Re: TimeSpan Iterator****
> ** **
> On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <> wrote:****
> On 08/28/2012 09:26 AM, wrote:****
> Does anyone know of a TimeSpan Iterator that will fetch rows based on
> the accumulo timestamp?****
> ** **
> We actually wrote our own TimestampRangeIterator and TimestampSetIterator
> classes.  I don't know if 1.4 has any in the core libraries.  It's not very
> hard though.****
> There's a TimestampFilter in org.apache.accumulo.core.iterators.user in
> 1.4.  It uses a range of timestamps.  Users should be aware that this is
> not an efficient operation, though.
> Billie****

View raw message