hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Re: A possible bug in the scanner.
Date Wed, 13 Apr 2011 17:47:13 GMT
   Thanks, this will resolve the particular case we ran into. But what if the files are huge
and have a wide range of timestamps and only some of the records in the file are valid? And
for the other application that we have: scans with filters that returns a sparse set, the
solution may not help.

   Further, it won't solve the underlying problem. When a scanner is busy, but doesn't have
any rows to return "yet", neither the client nor the region server should mistake it for an
unresponsive scanner.


On 4/13/11 8:43 AM, "Himanshu Vashishtha" <hvashish@cs.ualberta.ca> wrote:

Did you try setting scanner time range. It takes min and max timestamps, and
when instantiating the scanner  at RS, a time based filtering is done to
include only selected store files. Have a look at StoreFile.shouldseek(Scan,
Sortedset<byte[]). I think it should improve the response time.


On Wed, Apr 13, 2011 at 8:44 AM, Vidhyashankar Venkataraman <
vidhyash@yahoo-inc.com> wrote:

> Hi
>   We had enabled scanner caching but I don't think it is the same issue
> because scanner.next in this case is blocking: the scanner is busy in the
> region server but hasn't returned anything yet since a row to be returned
> hasn't been found yet (all rows have expired but are still there since they
> havent been compacted yet).
> Vidhya
> On 4/13/11 1:44 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:
> Have you read the following thread ?
> "ScannerTimeoutException when a scan enables caching, no exception when it
> doesn't"Did you enable caching ? If not, it is different issue.
> On Wed, Apr 13, 2011 at 12:40 AM, Vidhyashankar Venkataraman <
> vidhyash@yahoo-inc.com> wrote:
> > (This could be a known issue. Please let me know if it is).
> >
> > We had a set of uncompacted store files in a region. One of the column
> > families had a store file of 5 Gigs. The other column families were
> pretty
> > small (a few megabytes at most).
> >
> >  It so turned out that all these files had rows whose TTL had expired.
> Now
> > when this region was scanned (which should yield a result of a null set),
> we
> > got Scanner timeouts and UnknownScannerExceptions.
> >
> > And when we tried scanning the region without the large column family,
> the
> > scanner returned back safely with no result.
> >
> > So, I major compacted it and the scan started working correctly.
> >
> > So it looks like timeouts happen if the scanner does not return any
> output
> > for a specified time.
> > Which isn't exactly the correct thing to do, because it could be the case
> > that the scanner was indeed busy but it just so happened that there are
> no
> > rows yet to return back to the client.
> >
> > We can try increasing the scanner timeout, but this doesn't resolve the
> > underlying problem. Is this a know issue?
> >
> > Thank you
> > Vidhya
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message