hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer Thomas" <Thomas.Steinmau...@scch.at>
Subject RE: Performance characteristics of scans using timestamp as the filter
Date Mon, 10 Oct 2011 09:02:28 GMT

others have stated that one shouldn't try to use timestamps, although I
haven't figured out why? If it's reliability, which means, rows are
omitted, even if they should be included in a timerange-based scan, then
this might be a good argument. ;-)

One thing is that the timestamp AFAIK changes when you update a row even
cell values didn't change.


-----Original Message-----
From: Stuti Awasthi [mailto:stutiawasthi@hcl.com] 
Sent: Montag, 10. Oktober 2011 10:07
To: user@hbase.apache.org
Subject: RE: Performance characteristics of scans using timestamp as the

Hi Saurabh,

AFAIK you can also scan on the basis of Timestamp Range. This can
provide you data update in that timestamp range. You do not need to keep
timestamp in you row key.

-----Original Message-----
From: saurabh.r.s@gmail.com [mailto:saurabh.r.s@gmail.com] On Behalf Of
Sam Seigal
Sent: Monday, October 10, 2011 1:20 PM
To: user@hbase.apache.org
Subject: Re: Performance characteristics of scans using timestamp as the

Is it possible to do incremental processing without putting the
timestamp in the leading part of the row key in a more efficient manner
i.e. process data that came within the last hour/ 2 hour etc ? I can't
seem to find a good answer to this question myself.

On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas <
Thomas.Steinmaurer@scch.at> wrote:

> Leif,
> we are pretty much in the same boat with a custom timestamp at the end

> of a three-part rowkey, so basically we end up with reading all data 
> when processing daily batches. Beside performance aspects, have you 
> seen that using internals timestamps for scans etc... work reliable?
> Or did you come up with another solution to your problem?
> Thanks,
> Thomas
> -----Original Message-----
> From: Leif Wickland [mailto:leifwickland@gmail.com]
> Sent: Freitag, 09. September 2011 20:33
> To: user@hbase.apache.org
> Subject: Performance characteristics of scans using timestamp as the 
> filter
> (Apologies if this has been answered before.  I couldn't find anything

> in the archives quite along these lines.)
> I have a process which writes to HBase as new data arrives.  I'd like 
> to run a map-reduce periodically, say daily, that takes the new items
as input.
>  A naive approach would use a scan which grabs all of the rows that 
> have a timestamp in a specified interval as the input to a MapReduce.
> I tested a scenario like that with 10s of GB of data and it seemed to
perform OK.
>  Should I expected that approach to continue to perform reasonably 
> well when I have TBs of data?
> From what I understand of the HBase architecture, I don't see a reason

> that the the scan approach would continue to perform well as the data 
> grows.  It seems like I may have to keep a log of modified keys and 
> use that as the map-reduce input, instead.
> Thanks,
> Leif Wickland


The contents of this e-mail and any attachment(s) are confidential and
intended for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of HCL
or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message without
the prior written consent of the author of this e-mail is strictly
prohibited. If you have received this email in error please delete it
and notify the sender immediately. Before opening any mail and
attachments please check them for viruses and defect.


View raw message