hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel
Date Mon, 15 Jun 2015 18:47:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586471#comment-14586471
] 

Colin Patrick McCabe commented on HDFS-8578:
--------------------------------------------

Hi [~vinayrpet], [~raju.bairishetti], [~amareshwari].

I think it's a great idea to do the upgrade of each storage directory in parallel.  Although
these upgrades are usually quick, sometimes they aren't.  For example, if there is a slow
disk, we don't want to slow down the whole process.  Another reason is that when upgrades
are slow, it's almost always because we are I/O-bound.  So it just makes sense to do all the
directories (i.e. hard drives) in parallel.

There are a few cases where we will need to change certain log messages to include the storage
directory path, to avoid confusion when doing things in parallel.  Keep in mind the log messages
will appear in parallel, so we won't be able to rely on the log message ordering to tell us
which storage directory the message pertains to.
{code}
  private StorageDirectory loadStorageDirectory(DataNode datanode,                       
                                
      NamespaceInfo nsInfo, File dataDir, StartupOption startOpt) throws IOException {   
                                
...
        LOG.info("Formatting ...");                                                      
                                
{code}

The "Formatting..." log message must include the directory being formatted.

{code}
  private void linkAllBlocks(DataNode datanode, File fromDir, File toDir)
      throws IOException {
...
    LOG.info( hardLink.linkStats.report() );
  }
{code}
Here is another case where the existing LOG is not enough to tell us which storage directory
is being processed.

{code}
245	      try {
246	        IOException ioe = ioExceptionFuture.get();
...
259	      } catch (InterruptedException e) {
260	        LOG.error("InterruptedExeption while analyzing" + " blockpool "
261	            + nsInfo.getBlockPoolID());
262       }
{code}

If the thread gets an {{InterruptedException}} while waiting for a {{Future}}, you are simply
logging a message and giving up on waiting for that {{Future}}.  That's not right.  I think
this would be easier to get right by using Guava's {{Uninterruptibles#getUninterruptibly}}.
 You also should handle {{CancellationException}}.

Thanks, guys.

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs sequentially.
Assume it takes ~20 mins to process a single storage dir then  datanode which has ~10 disks
will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all storagedirs/disks
parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message