hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fenghua Hu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java
Date Mon, 15 Aug 2016 03:26:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420579#comment-15420579
] 

Fenghua Hu edited comment on HDFS-10690 at 8/15/16 3:25 AM:
------------------------------------------------------------

[~xyao], I have a question about the patch:

According to https://commons.apache.org/proper/commons-collections/jacoco/org.apache.commons.collections4.map/LinkedMap.java.html

    public K get(final int index) {
        return getEntry(index).getKey();
    }

    public V remove(final int index) {
        return remove(get(index));
    }

note that the parameter is index.
In the patch,
...
+      ShortCircuitReplica replica = (ShortCircuitReplica)evictableMmapped.get
+          (eldestKey);
...
+    ShortCircuitReplica removed = (ShortCircuitReplica)map.remove
+        (evictableTimeNs);

Here eldestKey and evictableTimeNs actually are the key, not the index, thus they should respectively
been changed to
evictableMmapped.getValue(map.indexOf(eldestKey));
and
map.remove(map.indexOf(evictableTimeNs);

But indexOf() is a O( n ) operation and LinkedMap's performance for remove won't compete with
TreeMap().

Correct me if i am wrong. Thanks.



was (Author: fenghua_hu):
[~xyao], I have a question about the patch:

According to https://commons.apache.org/proper/commons-collections/jacoco/org.apache.commons.collections4.map/LinkedMap.java.html

    public K get(final int index) {
        return getEntry(index).getKey();
    }

    public V remove(final int index) {
        return remove(get(index));
    }

note that the parameter is index.
In the patch,
...
+      ShortCircuitReplica replica = (ShortCircuitReplica)evictableMmapped.get
+          (eldestKey);
...
+    ShortCircuitReplica removed = (ShortCircuitReplica)map.remove
+        (evictableTimeNs);

Here eldestKey and evictableTimeNs actually are the key, not the index, thus they should respectively
been changed to
evictableMmapped.getValue(map.indexOf(eldestKey));
and
map.remove(map.indexOf(evictableTimeNs);

But indexOf() is a O(n) operation and LinkedMap's performance for remove won't compete with
TreeMap().

Correct me if i am wrong. Thanks.


> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---------------------------------------------------------------
>
>                 Key: HDFS-10690
>                 URL: https://issues.apache.org/jira/browse/HDFS-10690
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Fenghua Hu
>            Assignee: Fenghua Hu
>         Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the cached replicas.
> private final TreeMap<Long, ShortCircuitReplica> evictable = new TreeMap<>();
> private final TreeMap<Long, ShortCircuitReplica> evictableMmapped = new TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using traditional
HDD. But when using high-performance SSD/PCIe Flash, the cost inserting/removing an entry
 becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a very low cost
operation. On the other hand, list is not lookup-friendly. To address this issue, we introduce
two references into ShortCircuitReplica object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We only need
to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash as storage
media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and patch will
be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message