hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Xinglong (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-17237) Improve NameNode RPC throughput with ReadWriteRpcCallQueue
Date Tue, 01 Sep 2020 11:59:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188385#comment-17188385
] 

Wang, Xinglong commented on HADOOP-17237:
-----------------------------------------

[~daryn] [~chrilisf] [~kihwal]

Could you please help to comment on this idea? Do you think this will mess up the namenode
meta data and give wrong rpc result and cause jobs to fail due to we reordered the rpc in
queue?

 

> Improve NameNode RPC throughput with ReadWriteRpcCallQueue 
> -----------------------------------------------------------
>
>                 Key: HADOOP-17237
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17237
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: rpc-server
>            Reporter: Wang, Xinglong
>            Priority: Major
>
> *Current*
>  In our production cluster, a typical traffic model is read to write raito is 10:1 and
sometimes the ratios goes to 30:1.
>  NameNode is using ReEntrantReadWriteLock under the hood of FSNamesystemLock. Read lock
is shared lock while write lock is exclusive lock.
> Read RPC and Write RPC comes randomly to namenode. This makes read and write mixed up.
And then only a small fraction of read can really share their read lock.
> Currently we have default callqueue and faircallqueue. And we can refreshCallQueue on
the fly. This opens room to design new call queue.
> *Idea*
>  If we reorder the rpc call in callqueue to group read rpc together and write rpc together,
we will have sort of control to let a batch of read rpc come to handlers together and possibly
share the same read lock. Thus we can reduce Fragments of read locks.
>  This will only improve the chance to share the read lock among the batch of read rpc
due to there are some namenode internal write lock is out of call queue.
> Under ReEntrantReadWriteLock, there is a queue to manage threads asking for locks. We
can give an example.
>  R: stands for read rpc
>  W: stands for write rpc
>  e.g
>  RRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRWRRRRW
>  In this case, we need 16 lock timeslice.
> optimized
>  RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRWWWWWWWW
>  In this case, we only need 9 lock timeslice.
> *Correctness*
>  Since the execution order of any 2 concurrent or queued rpc in namenode is not guaranteed.
We can reorder the rpc in callqueue into read group and write group. And then dequeue from
these 2 queues by a designed strategy. let's say dequeue 100 read and then dequeue 5 write
rpc and then dequeue read again and then write again.
>  Since FairCallQueue also does rpc call reorder in callqueue, for this part I think they
share the same logic to guarantee rpc result correctness.
> *Performance*
>  In test environment, we can see a 15% - 20% NameNode RPC throughput improvement comparing
with default callqueue. 
>  Test traffic is 30 read:3 write :1 list using NNLoadGeneratorMR
> This performance is not a surprise. Due to some write rpc is not managed in callqueue.
We can't do reorder to them by reording calls in callqueue. 
>  But still we can do a fully read write reorder if we redesign ReEntrantReadWriteLock
to achieve this. This will be further step after this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message