hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11029) FileSystem#Statistics uses volatile variables that must be updated on write or read calls.
Date Tue, 02 Sep 2014 18:02:21 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118439#comment-14118439
] 

Colin Patrick McCabe commented on HADOOP-11029:
-----------------------------------------------

The original title here was "LocalFS Statistics performs thread local call per byte written."
 This title is misleading in a few ways.  First of all, this is a generic Filesystem.java
thing, not a LocalFS thing.  Secondly, it only "performs a thread local call per byte written"
if you write a byte at a time.  If you write a byte at a time, you will have other high overheads,
such as system call overhead (for local FS).  Finally, I don't think ThreadLocal is the source
of that locked instruction you pointed out.

That lock prefix is almost certainly coming from the fact that the fields inside Statistics
are volatile, not from {{ThreadLocal#get}}.  The entire point of {{ThreadLocal}} is that it
is local to a thread, so no cross-thread synchronization should be required to access it (in
a good implementation).  But we do require the volatile variables so that the updated statistics
can be visible to other threads calling {{Statistics#get}}

The thread-local statistics code was added to alleviate a performance problem where we were
taking a lock, adding to a counter, and then releasing the lock after every write or read
operation.  It was a substantial improvement on that naive implementation.  See HDFS-5276
for details.

I'm not sure if Java offers a way to avoid those volatile variables.  As I mentioned, the
reason the Statistics fields are volatile is so that other threads can read their values when
we're calling {{Statistics#get}}.  Is there a way to do the memory barrier only on read? 
If so, I'm not aware of it.  Maybe we could cast to volatile?  Or is there a memory barrier
instruction in Unsafe?  That would be the only way to avoid the volatile.

Can you put some numbers behind this?  If this is only an issue when you are writing a byte
at a time, I'd be inclined to close this, since, as I mentioned, byte-at-a-time is a well-known
anti-pattern with Java streams.

> FileSystem#Statistics uses volatile variables that must be updated on write or read calls.
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11029
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Gopal V
>         Attachments: local-fs-locking.png
>
>
> This code is there in the hot-path of IFile writer via RawLocalFileSystem.
> !local-fs-locking.png!
> From a preliminary glance, the lock prefix calls are coming from a threadlocal.get()
within FileSystem.Statistics 
> {code}
>    /**
>      * Get or create the thread-local data associated with the current thread.
>      */
>     private StatisticsData getThreadData() {
>       StatisticsData data = threadData.get();
>       if (data == null) {
>         data = new StatisticsData(
>             new WeakReference<Thread>(Thread.currentThread()));
>         threadData.set(data);
>         synchronized(this) {
>           if (allData == null) {
>             allData = new LinkedList<StatisticsData>();
>           }
>           allData.add(data);
>         }
>       }
>       return data;
>     }
>     /**
>      * Increment the bytes read in the statistics
>      * @param newBytes the additional bytes read
>      */
>     public void incrementBytesRead(long newBytes) {
>       getThreadData().bytesRead += newBytes;
>     }
> {code}
> This is incredibly inefficient when used from FSDataOutputStream
> {code}
>     public void write(int b) throws IOException {
>       out.write(b);
>       position++;
>       if (statistics != null) {
>         statistics.incrementBytesWritten(1);
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message