hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiang Li (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-19917) Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient
Date Sun, 04 Feb 2018 13:40:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351784#comment-16351784
] 

Xiang Li edited comment on HBASE-19917 at 2/4/18 1:39 PM:
----------------------------------------------------------

Thanks for your comment [~yuzhihong@gmail.com]!
{{filterServers()}} is only called in {{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet actually, built
by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, size of servers
is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because TreeSet#contains() is logn and
we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is O(m + n), as
the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O( n ) for time complexity (if I get it correctly)
as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * O(1).
I think #1 is good enough, although it is worse than #2 which is linear. What is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be used to replace
TreeSet throughout the calling chain.




was (Author: water):
Thanks for your comment [~yuzhihong@gmail.com]!
{{filterServers()}} is only called in {{RSGroupBasedLoadBalancer#filterServers()}}, as follow:
{code}
return filterServers(RSGroupInfo.getServers(), onlineServers);
{code}
{{RSGroupInfo#getServers()}} returns servers, a SortedSet. It is a TreeSet actually, built
by its constructor.

Given a TreeSet, there are 2 ways: (Let's say when calling {{filterServers()}}, size of servers
is n and size of onlineServers is m)
# Keep using TreeSet. Time complexity is O(m * logn). Because TreeSet#contains() is logn and
we loop for m.
# Turn TreeSet into HashSet, to pursue O(1) for contains(). Time complexity is O(m + n), as
the following 2 steps are included:
## Construct a HashSet from a TreeSet. It is O(n) for time complexity (if I get it correctly)
as it needs to iterate the TreeSet
## Calculate the union of severs and onlineServers. The time complexity is m * O(1).
I think #1 is good enough, although it is worse than #2 which is linear. What is your opinion?

Regarding
bq. If possible, we should change those to using HashSet.
In RSGroupInfo, servers as well as tables is TreeSet. According to the comments, 
{code}
// Keep servers in a sorted set so has an expected ordering when displayed.
private final SortedSet<Address> servers;
// Keep tables sorted too.
private final SortedSet<TableName> tables;
{code}
TreeSet is only used for display purpose. I am checking if HashSet could be used to replace
TreeSet throughout the calling chain.



> Improve RSGroupBasedLoadBalancer#filterServers() to be more efficient
> ---------------------------------------------------------------------
>
>                 Key: HBASE-19917
>                 URL: https://issues.apache.org/jira/browse/HBASE-19917
>             Project: HBase
>          Issue Type: Improvement
>          Components: rsgroup
>            Reporter: Xiang Li
>            Assignee: Xiang Li
>            Priority: Minor
>         Attachments: HBASE-19917.master.000.patch
>
>
> {code:title=hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupBasedLoadBalancer.java|borderStyle=solid}
> private List<ServerName> filterServers(Collection<Address> servers,
>     Collection<ServerName> onlineServers) {
>   ArrayList<ServerName> finalList = new ArrayList<ServerName>();
>   for (Address server : servers) {
>     for(ServerName curr: onlineServers) {
>       if(curr.getAddress().equals(server)) {
>         finalList.add(curr);
>       }
>     }
>   }
>   return finalList;
> }
> {code}
> filterServers is to return the union of servers and onlineServers. The current implementation
has time complexity as O(m * n) (2 loops), could be in O(m + n) if HashSet is used. The trade-off
is space complexity is increased.
> Another point which could be improved: filterServers() is only called in filterOfflineServers().
filterOfflineServers calls filterServers(Set, List). The current filterServers(Collection,
Collection) seems could be improved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message