hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fenghua Hu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java
Date Tue, 23 Aug 2016 16:41:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433142#comment-15433142
] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:41 PM:
------------------------------------------------------------

I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the order of being
inserted. These entries could be removed from TreeMap in two cases. The first is when the
entry is accessed again, it will be removed from TreeMap. Please note that the entry could
be anyone in the TreeMap. The other is when the entry is evicted due to treemap size limitation,
in this case, only the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up ShortCircuitReplica is
needed, in TreeMap, it's O(log n) operation. To improve it, we design a new data structure
LruList, which entirely eliminates costly look-up operation. 
            +----------------+                    +-----------------+                    +-----------------+
      ---->|   Replica 1  |-----Next----> |    Replica  2  |-----Next----> |    Replica
 3  |
     <-----|   Replica 1  |<-----Prev---- |    Replica  2  |<-----Prev---- |    Replica
 3  |
            +----------------+                    +-----------------+                    +-----------------+


We introduced two references in ShortCircuitReplica objects. Reference Next points to the
elder ShortCircuitReplica and Prev points to the younger one. All the replica is doubly linked
in order of insertion time. The youngest is always at the head of the linked list, and the
eldest is always at the tail.  Removing the entries between the head and the tail doesn't
need any lookup, because the replica knows its position in linked list by next and prev, thus
remove is simple: change it's precedessor's and its succssor's next and prev. The order of
operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the operation is also
O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the order of being
inserted. These entries could be removed from TreeMap in two cases. The first is when the
entry is accessed again, it will be removed from TreeMap. Please note that the entry could
be anyone in the TreeMap. The other is when the entry is evicted due to treemap size limitation,
in this case, only the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up ShortCircuitReplica is
needed, in TreeMap, it's O(log n) operation. To improve it, we design a new data structure
LruList, which entirely eliminates costly look-up operation. 
            +----------------+                    +-----------------+                    +-----------------+
             |                    |                     |                      |         
           |                      |
      ---->|   Replica 1  |-----Next----> |    Replica  2  |-----Next----> |    Replica
 3  |
             |                    |                     |                      |         
           |                      |
     <-----|                    |<-----Prev---- |                      |<-----Prev----
|                      |
            +----------------+                     +-----------------+                   
+-----------------+


We introduced two references in ShortCircuitReplica objects. Reference Next points to the
elder ShortCircuitReplica and Prev points to the younger one. All the replica is doubly linked
in order of insertion time. The youngest is always at the head of the linked list, and the
eldest is always at the tail.  Removing the entries between the head and the tail doesn't
need any lookup, because the replica knows its position in linked list by next and prev, thus
remove is simple: change it's precedessor's and its succssor's next and prev. The order of
operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the operation is also
O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
>                 Key: HDFS-10690
>                 URL: https://issues.apache.org/jira/browse/HDFS-10690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Fenghua Hu
>            Assignee: Fenghua Hu
>         Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using traditional
HDD. But when using high-performance SSD/PCIe Flash, the cost inserting/removing an entry
 becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a very low cost
operation. On the other hand, list is not lookup-friendly. To address this issue, we introduce
two references into ShortCircuitReplica object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We only need
to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash as storage
media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and patch will
be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message