hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Lawlor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
Date Thu, 12 Mar 2015 01:03:03 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357903#comment-14357903

Jonathan Lawlor commented on HBASE-13090:

Thanks for the comments [~stack]

bq. If only timeout, then maybe premature for ScanLimit unless anything in current Scan structure
that might sit better in ScanLimit?
I was thinking that we could combine the batch limit, size limit, and now the time limit into
ScannerLimit object. With this patch, the InternalScanner and RegionScanner interfaces now
have a large cascading call structure that looks like this:
NextState next(List<Cell> result) throws IOException;
NextState next(List<Cell> result, int batchLimit) throws IOException;
NextState next(List<Cell> result, int batchLimit, long sizeLimit) throws IOException;
NextState next(List<Cell> result, int batchLimit, long sizeLimit, long timeLimit) throws

As more limits are added, it gets uglier and uglier. The idea with ScannerLimit would be to
change it to this:

NextState next(List<Cell> result) throws IOException;
NextState next(List<Cell> result, ScannerLimit limit) throws IOException;

Where the ScannerLimit object can have as many limits specified as it wants (may only contain
a time limit, or may contain a time limit, batch limit and size limit).

bq. What would be the downsides if default was to allow return of partials to clients?
So right now partial result support is on by default but in the case that the scan is specified
to be a small scan we disable partial results server side. This means that in the case of
small scans we wouldn't allow heartbeat messages either since they could potentially create
partials. Outside of small scans heartbeats would be supported.

bq. since you can't specify your own Scanner implementation serverside (you can't right?)
As far as I can tell there is no nice way to specify your own StoreScanner implementation
but upon further investigation it looks like I can specify my own KeyValueHeap implementation
inside the RegionScanners. This would allow me to take this method out. Going to investigate
further and see if this ugly postHeapNext method can be taken out.

bq. When do I call isHeartbeatMessage? At want point in the processing?
Currently it is used inside ClientScanner.java after the Result array comes back from the
server. By checking it here, we can see if the most recent response from the server (the one
that returned the Results array) was a heartbeat message.

> Progress heartbeats for long running scanners
> ---------------------------------------------
>                 Key: HBASE-13090
>                 URL: https://issues.apache.org/jira/browse/HBASE-13090
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Assignee: Jonathan Lawlor
>         Attachments: HBASE-13090-v1.patch
> It can be necessary to set very long timeouts for clients that issue scans over large
regions when all data in the region might be filtered out depending on scan criteria. This
is a usability concern because it can be hard to identify what worst case timeout to use until
scans are occasionally/intermittently failing in production, depending on variable scan criteria.
It would be better if the client-server scan protocol can send back periodic progress heartbeats
to clients as long as server scanners are alive and making progress.
> This is related but orthogonal to streaming scan (HBASE-13071). 

This message was sent by Atlassian JIRA

View raw message