accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parise, Jonathan" <Jonathan.Par...@gd-ms.com>
Subject Scanning In Timestamp Order
Date Wed, 02 Sep 2015 21:00:01 GMT
Hi,

I was wondering if there is a way to scan a table based on the timestamps. For example, is
there a way to set a range based on the timestamp portion of the key?

I know that standard practice is to add a timestamp as part of the row id, but in this particular
case I probably cannot use that technique. The reason I can't use it is that I need to find
the most recent data in a preexisting Accumulo instance. Not all of the information was stored
with timestamps as appended to the row id. I can't go back and change the data, I just have
to work with what is there.

So, given a large amount of preexisting data without time information in the row id, column
family or column qualifier, how would you scan for the most recent data?

Specifically, is there any way to scan/sort by the timestamp portion of the key. I did not
see any way to make a Range with times.

I also really do not want to run a job over all the data to make a new copy of the table that
is sorted. I have a lot of data here and such a replication would take a very long time.


Thanks,

Jon

Mime
View raw message