hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log
Date Thu, 04 Oct 2007 03:18:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532313
] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------

FSEditLog.java :

- in logSync() : mytxid should be set to min(id.txid, txid) otherwise, when id.txtd is MAX_VALUE,
thread could stay in logSync() for longer time (i.e. it will always sync). This can happen
when completeFile() returns false, which is quite often.   
-- Another option is not to reset id.txid but provide logSyncTillNow(), which calls logSync()
with id.txid set to current txid, if such a call is required.

- synchronized (editstream) is not required inside logEdit(). Looks like it existed before
but can be removed.

- there are two calls to System.currentTimeMillis() in side editLog(). editLog() is an in
memory operation. I don't think we need to measure that. editLog() is just like any other
processing now.

I haven't looked at the Stats etc yet.


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch, transactionLogSync3.patch,
transactionLogSync4.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate
of transactions that are being logged into tghe edits log. In the current code, a batching
scheme implies that all transactions do not have to incur a sync of the edits log to disk.
However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary
buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock,
acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the
persistent store. Since the buffers are swapped, new transactions continue to get logged into
the new buffer. (Of course, the new transactions cannot complete before this new buffer is
sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message