hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase scan time range, inconsistency
Date Wed, 25 Feb 2015 03:01:19 GMT
What's the TTL setting for your table ?

Which hbase release are you using ?

Was there compaction in between the scans ?


> On Feb 24, 2015, at 2:32 PM, Stephen Durfey <sjdurfey@gmail.com> wrote:
> I have some code that accepts a time range and looks for data written to an HBase table
during that range. If anything has been written for that row during that range, the row key
is saved off, and sometime later in the pipeline those row keys are used to extract the entire
row. I’m testing against a fixed time range, at some point in the past. This is being done
as part of a Map/Reduce job (using Apache Crunch). I have some job counters setup to keep
track of the number of rows extracted. Since the time range is fixed, I would expect the scan
to return the same number of rows with data in the provided time range. However, I am seeing
this number vary from scan to scan (bouncing between increasing and decreasing). 
> I’ve eliminated the possibility that data is being pulled in from outside the time
range. I did this by scanning for one column qualifier (and only using this as the qualifier
for if a row had data in the time range), getting the timestamp on the cell for each returned
row and compared it against the begin and end times for the scan, and I didn’t find any
that satisfied that criteria. I’ve observed some row keys show up in the 1st scan, then
drop out in the 2nd scan, only to show back up again in the 3rd scan (all with the exact same
Scan object). These numbers have varied wildly, from being off by 2-3 between subsequent scans
to 40 row increases, followed by a drop of 70 rows. 
> I’m kind of looking for ideas to try to track down what could be causing this to happen.
The code itself is pretty simple, it creates a Scan object, scans the table, and then in the
map phase, extract out the row key, and at the end, it dumps them to a directory in hdfs.

View raw message