hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-12475) Replace guava Cache with ConcurrentHashMap for caching Connection in ipc Client
Date Thu, 15 Oct 2015 03:34:05 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Walter Su updated HADOOP-12475:
-------------------------------
    Attachment: HADOOP-12475.02.patch

1. Connection leak means an opened connection which forgets to close. It doesn't happen here.
Connection is opened at {{setupIOstreams}}. And it will be closed even it's not in the cache.

2. It's true a race condition can cause creating multiple connections for the same remoteId.
I replace it with {{putIfAbsent}}. Thanks [~Apache9], [~sjlee0]

3. There is another race condition about removing. It's caused by HADOOP-11772.
Before HADOOP-11772, Client used {{HashTable}} for caching. The removing logic is
{code}
1175	      synchronized (connections) {		
1176	        if (connections.get(remoteId) == this) {		
1177	          connections.remove(remoteId);		
1178	        }		
1179	      }
{code}
It bothers me a while why a thread-safe HashTable need {{synchronized}}. Then I know this
connection is closed, should be removed. But other thread could have already known this closedConnection,
and replace it with a new connection.

I use conditional remove {{ConcurrentHashMap.remove(key,value)}} to address this.

I think before HADOOP-11772,  {{synchronized (connections)}} (which locks the whole table
and HashTable also locks whole table) is the reason why the invokers choke up.

Uploaded 02 patch.

> Replace guava Cache with ConcurrentHashMap for caching Connection in ipc Client
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-12475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12475
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: conf, io, ipc
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HADOOP-12475.01.patch, HADOOP-12475.02.patch
>
>
> quote [~daryn] from HADOOP-11772:
> {quote}
> CacheBuilder is obscenely expensive for concurrent map, and it requires generating unnecessary
garbage even just to look up a key. Replace it with ConcurrentHashMap.
> I identified this issue that impaired my own perf testing under load. The slowdown isn't
just the sync. It's the expensive of Connection's ctor stalling other connections. The expensive
of ConnectionId#equals causes delays. Synch'ing on connections causes unfair contention unlike
a sync'ed method. Concurrency simply hides this.
> {quote}
> BTW, guava Cache is heavyweight. Per local test, ConcurrentHashMap has better overal
performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message