hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-13090) Progress heartbeats for long running scanners
Date Thu, 12 Mar 2015 00:09:40 GMT

     [ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jonathan Lawlor updated HBASE-13090:
    Attachment: HBASE-13090-v1.patch

Attached is a rough work in progress patch. The patch does provide a working implementation
of heartbeats but I believe it could be refined so I am looking to get some feedback.

The implementation points that I wanted to highlight for discussion are below:
* We wanted to move all time tracking into RegionScanner and StoreScanner and leave RSRpcServices
unscathed. I started off with that intention but it was slowly revealed that it may be better
to simply have a timeLimit field in the call to nextRaw from RSRpcServices. Logic outlined
** While it is certainly possible to add a reset() or newSession() method to the RegionScanner
interface that would allow us to reset time tracking, the issue becomes how do we communicate
that size limit down from the RegionScanner into the StoreScanner (the scanner that is looping
through the cells for a particular column family). 
** The StoreScanners are stored in a KeyValueHeap in the RegionScanner... So it would be possible
to loop through them all and call a similar reset/newSession method on all of them but that
seems dirty and wasteful. It seems more appropriate to communicate the timeLimit down to only
the relevant storeScanner via a timeLimit field in the InternalScanner#next(List<Cell>
results, ..., timeLimit) call.
** Since the RegionScanner also implements the InternalScanner interface, that same next method
would need to be implemented in RegionScannerImpl. Because of this, I think it makes the most
sense to simply have a nextRaw(List<Cell>, ..., timeLimit) method to specify the timeLimit
from RSRpcServices rather than an update/newSession call
* To avoid polluting the returned Result array with state information about heartbeats, a
new heartbeat flag has been added to the ScanResponse. Since only the ScannerCallable ever
sees the ScanResponse returned from the server, I have exposed the method ScannerCallable#isHeartbeatMessage()
to allow the ClientScanner to check if the most recent server response was a heartbeat/keep-alive
* The method postHeapNext(List<Cells>) was added to RegionScannerImpl to allow me to
insert delays in between fetches of column family cells for testing. It didn't feel clean,
so I was wondering if anyone had any ideas about alternative approaches to emulate long running
scans on the server side
* Since heartbeat messages have the potential to create partial results (in the event that
the timeout occurs in the middle of a row) we only allow heartbeat messages if the client
has specified that heartbeats are supported AND partial results are also supported. 

Ideas for improvement:
* As earlier discussion indicated, the tracking of limits in RSRpcServices is somewhat messy.
When a new limit needs to be added, the RegionScanner and InternalScanner interfaces must
both be changed. The limit logic may be simplified by defining something along the lines of
a ScannerLimit object. The object would have a field per limit and would have an associated
Builder that would allow us to specify only the limits we care about (if a limit is not set,
then it doesn't get enforced). Then, in the future, if a new limit was needed it would only
amount to adding a new field in ScannerLimit and adding the appropriate enforcement logic
(no changes to interfaces necessary). What do you guys think? I thought this would clean things
up a bit but wanted to see if any objections first

Of course the finer implementation points can be seen in the patch itself and any feedback
would be appreciated. Will post to reviewboard


> Progress heartbeats for long running scanners
> ---------------------------------------------
>                 Key: HBASE-13090
>                 URL: https://issues.apache.org/jira/browse/HBASE-13090
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>         Attachments: HBASE-13090-v1.patch
> It can be necessary to set very long timeouts for clients that issue scans over large
regions when all data in the region might be filtered out depending on scan criteria. This
is a usability concern because it can be hard to identify what worst case timeout to use until
scans are occasionally/intermittently failing in production, depending on variable scan criteria.
It would be better if the client-server scan protocol can send back periodic progress heartbeats
to clients as long as server scanners are alive and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071). 

This message was sent by Atlassian JIRA

View raw message