accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Scanning In Timestamp Order
Date Wed, 02 Sep 2015 21:13:42 GMT

Short answer: no.

In RDBMS parlance, Accumulo has a single index. That index is the "row" 
portion of the Key class. This is the reason you see that as a "standard 
practice". Any other attempt to fetch data based on another component of 
the key (ignoring locality groups/column family subtleties) is an 
exhaustive scan of your dataset.

If you are going to support this application for any duration of time, 
it is a good idea to take the penalty once in rewriting your old data 
into the new format to make all of your queries henceforth fast. If you 
have such a significant amount of data that you want to avoid running a 
large mapreduce task, you'll likely not want to make your users wait to 
read all of that data to answer every query :)

Does that make sense?

- Josh

Parise, Jonathan wrote:
> Hi,
> I was wondering if there is a way to scan a table based on the
> timestamps. For example, is there a way to set a range based on the
> timestamp portion of the key?
> I know that standard practice is to add a timestamp as part of the row
> id, but in this particular case I probably cannot use that technique.
> The reason I can’t use it is that I need to find the most recent data in
> a preexisting Accumulo instance. Not all of the information was stored
> with timestamps as appended to the row id. I can’t go back and change
> the data, I just have to work with what is there.
> So, given a large amount of preexisting data without time information in
> the row id, column family or column qualifier, how would you scan for
> the most recent data?
> Specifically, is there any way to scan/sort by the timestamp portion of
> the key. I did not see any way to make a Range with times.
> I also really do not want to run a job over all the data to make a new
> copy of the table that is sorted. I have a lot of data here and such a
> replication would take a very long time.
> Thanks,
> Jon

View raw message