hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1942) Increase the concurrency of transaction logging to edits log
Date Sat, 29 Sep 2007 16:56:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531257
] 

Raghu Angadi commented on HADOOP-1942:
--------------------------------------


Dhuba, this works around the locking issue with a 1millisec sleep, which is probably ok though
better not to do IMO.

I think there is another issue w.r.t how we use lastModificationTIme and lastSyncTIme. Assume
each sync takes 1 milisec (might be more) and there is a steady load of more than 1000 edits
per sec (quite common). Then lastSyncTIme is _always_ equal or behind lastModTIme. So that
implies every IPC thread will run a sync (plus the newly added sleep time). 

This essentially brings us back to same situation : number of edits possible is not much larger
than number of syncs possible per sec. I might be mistaken here but the benchmark stats can
show this.

I basically like the idea of using two buffers to increase sync efficiency. I think it will
have a big improvement on NNBench. I think locking  looks complicated because we have 3 read/write
locks. I think it can be done with one simle synchronized lock, and not affected by the 'lastModTime'
issue above :

{code}
synchronized void logEdit(...) {
      writeEdit( currentBuffer );
      processErrorStreams(); // etc
} 

void logSync {
   long myGen = 0;
   long localSyncGen = 0;

   synchronized (this) {
       myGen = currentGen;
       
       while ( myGen > syncGen && isSyncRunning ) {
            wait(100);
        }

        if ( myGen <= syncGen ) {
           return;
        }

        // now this thread is expected to run the sync.
       localSyncGen = currentGen;
       isSyncRunning = true;      
       swapBuffers() ;
       currentGen++;
    }

    //sync the old buffer.
    //also sync could be skipped if there is no data in the old buffer.

   synchronized (this) {
       isSyncRunning = false;
       processErrorStreams(); //etc.
       syncGen = localSyncGen;
       editLoc.notifyAll();
   }
}
{code}

Regd processErrorStreams() : this is an error condition and usually never happens. It could
be something like this : 

{code}
synchronized processErrorStreams() {
     while ( isSyncRunning) {
            wait();
       }
     //remove the error streams.       
   }    
}
{code}


> Increase the concurrency of transaction logging to edits log
> ------------------------------------------------------------
>
>                 Key: HADOOP-1942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1942
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: transactionLogSync.patch, transactionLogSync2.patch
>
>
> For some typical workloads, the throughput of the namenode is bottlenecked by the rate
of transactions that are being logged into tghe edits log. In the current code, a batching
scheme implies that all transactions do not have to incur a sync of the edits log to disk.
However, the existing batch-ing scheme can be improved.
> One option is to keep two buffers associated with edits file. Threads write to the primary
buffer while holding the FSNamesystem lock. Then the thread release the FSNamesystem lock,
acquires a new lock called the syncLock, swaps buffers, and flushes the old buffer to the
persistent store. Since the buffers are swapped, new transactions continue to get logged into
the new buffer. (Of course, the new transactions cannot complete before this new buffer is
sync-ed).
> This approach does a better job of batching syncs to disk, thus improving performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message