hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
Date Tue, 03 Nov 2015 18:11:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987759#comment-14987759

ramkrishna.s.vasudevan commented on HBASE-12790:

Thanks for all the comments on the RB. Had an offline discussion with Andy, James and Anoop.
 I would like to update the discussion here.
We will extend the groupid concept to all the client requests. That includes scan, gets, MutateRequest,
MultiRequest, Bulkloadrequest etc.
In order to do this we expose the groupId API at the Operation level. This will allow every
Put, Delete, Increment, Append, Get and Scan to have a grouping id. 
Now at the Rpc layer the scan and gets have one to one mapping with the scan requests. So
the groupid set on the individual scan/gets can be used to do the round robin.
But for MultiRequest there could be 'n' number of actions like Puts, deletes, gets etc. And
every thing will be mapped to one multiRequest. Since we expose groupId at the Operation level
it will mean that different actions can have different groupids set but at the Rpc layer we
take the first groupId as the id for the entire multiRequest. I had a concern with this part
because users will be allowed to set different groupIds but internally we will be using only
one of them and this point gets hidden from the user totally. May be it could confuse the
user is what I thought. Overall this groupingId concept is not a direct parameter that affects
the users result whereas it is more on how the server is going to handle the request. 
I can update the patch based on the above feedbacks/discussions. Any more queries and feedback
are welcome!!

> Support fairness across parallelized scans
> ------------------------------------------
>                 Key: HBASE-12790
>                 URL: https://issues.apache.org/jira/browse/HBASE-12790
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>              Labels: Phoenix
>         Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, HBASE-12790_1.patch,
HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, HBASE-12790_trunk_1.patch, PHOENIX_4.5.3-HBase-0.98-2317-SNAPSHOT.zip
> Some HBase clients parallelize the execution of a scan to reduce latency in getting back
results. This can lead to starvation with a loaded cluster and interleaved scans, since the
RPC queue will be ordered and processed on a FIFO basis. For example, if there are two clients,
A & B that submit largish scans at the same time. Say each scan is broken down into 100
scans by the client (broken down into equal depth chunks along the row key), and the 100 scans
of client A are queued first, followed immediately by the 100 scans of client B. In this case,
client B will be starved out of getting any results back until the scans for client A complete.
> One solution to this is to use the attached AbstractRoundRobinQueue instead of the standard
FIFO queue. The queue to be used could be (maybe it already is) configurable based on a new
config parameter. Using this queue would require the client to have the same identifier for
all of the 100 parallel scans that represent a single logical scan from the clients point
of view. With this information, the round robin queue would pick off a task from the queue
in a round robin fashion (instead of a strictly FIFO manner) to prevent starvation over interleaved
parallelized scans.

This message was sent by Atlassian JIRA

View raw message