hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
Date Tue, 10 Nov 2015 18:26:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999067#comment-14999067

stack commented on HBASE-12790:

bq. Glad to see you're working on that over at Cloudera.  Hopefully you're testing with Phoenix

You did not read it. It is not "cloudera" work. It is apache hbase work. See listed JIRAs.
 It is a summary of the state of scheduling art in apache hbase as of a while ago.

bq. I don't think having an extra optional attribute on an operation adds "a bunch of new
complexity". That's fine if we disagree.

Andrews' considered response 'On complexity' plainly left no mark and you can't have reviewed
the attached patch and comments. Only a superficial engagement with this issue and what all
is involved could result in a characterization of what is going on here as just "having an
extra optional attribute" (or that the cited, pertinent blog post is 'cloudera' work).

bq. Andrew Purtell made the point that if you're round robining on reads you should be consistent
and do it on writes too - I think this is a fair point. Our immediate need is on the read
side - I'll share our data when the analysis is complete... Our requirement is simple: the
latency of point lookups and small-ish scans shouldn't be impacted by other workloads on the
cluster. What ever implementation you come up with is fine by us.

Your requirement changes every time you comment and you do not know what you are asking for.

Let me try and write something up and situate it relative to work already done.

> Support fairness across parallelized scans
> ------------------------------------------
>                 Key: HBASE-12790
>                 URL: https://issues.apache.org/jira/browse/HBASE-12790
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>              Labels: Phoenix
>         Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, HBASE-12790_1.patch,
HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip
> Some HBase clients parallelize the execution of a scan to reduce latency in getting back
results. This can lead to starvation with a loaded cluster and interleaved scans, since the
RPC queue will be ordered and processed on a FIFO basis. For example, if there are two clients,
A & B that submit largish scans at the same time. Say each scan is broken down into 100
scans by the client (broken down into equal depth chunks along the row key), and the 100 scans
of client A are queued first, followed immediately by the 100 scans of client B. In this case,
client B will be starved out of getting any results back until the scans for client A complete.
> One solution to this is to use the attached AbstractRoundRobinQueue instead of the standard
FIFO queue. The queue to be used could be (maybe it already is) configurable based on a new
config parameter. Using this queue would require the client to have the same identifier for
all of the 100 parallel scans that represent a single logical scan from the clients point
of view. With this information, the round robin queue would pick off a task from the queue
in a round robin fashion (instead of a strictly FIFO manner) to prevent starvation over interleaved
parallelized scans.

This message was sent by Atlassian JIRA

View raw message