hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added
Date Fri, 13 Oct 2017 15:45:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16203720#comment-16203720

Jason Lowe commented on YARN-7244:

Thanks for updating the patch!  The test failure appears to be related, and it would be good
to cleanup the checkstyle issues.

In TestShuffleHandler the code to build absLogDir is replicated a lot of times and is always
the same value.  This should be a final static computed once in TestShuffleHandler and referenced
in the places that need it rather than copy-n-paste of the code.  This also removes the need
for the getAbsLogDir method.  When computing absLogDir the code should not assume the target
directory is the right place but rather look at the environment for direction where to go.
 GenericTestUtils#getTestDir would help here.

Rather than needing a TestShuffleHandler1 class that overrides getAuxilaryLocalPathHandler,
I think we can just use ShuffleHandler directly and call setAuxiliaryLocalPathHandler after
creating it.  That's what the nodemanager is going to do anyway during normal execution, and
the unit test would be closer to the real-world scenario.

Do we really need to provide both a TestAuxiliaryLocalPathHandler and TestAuxiliaryLocalPathHandler1?
 The original test code used a LocalDirAllocator directly, and it seems like we could use
one test handler class that passes through to the same local dir allocator that was originally
setup in the unit test.

AuxiliaryService#getAuxiliaryLocalPathHandler and AuxiliaryService#setAuxiliaryLocalPathHandler
need javadocs since this is a public class that we expect app framework providers to use.

AuxServices should take the path handler in its constructor since it's not legal to call any
other methods before it has been set.  This allows the handler field to be marked final and
removes the need for get/set methods.

In BaseContainerManagerTest it's more appropriate to start the node health checker service
rather than the dirs handler, since the health checker is responsible for initing and starting
the dirs handler.  Otherwise it looks weird that we started a service without calling init

TestAuxServices should use Mockito to build the AuxiliaryLocalPathHandler mock object.

AuxService#getServices returns a collection of AuxiliaryService objects, so it would be cleaner
to iterate on AuxiliaryService objects directly rather than iterate on Service objects and
cast them in the loop.

I'm not a fan of testContainerAuxPathHandler since it uses long sleeps to essentially test
that the aux path handler is hooked up to the nodemanager's dir handler.  It would be much
better if we could eliminate the need for sleeps and instead set a mocked/controllable dirs
handler in the containermanager then change the behavior of the mocked object and verify that
change is reflected in the aux path handler is well.  Or even simpler, AuxiliaryLocalPathHandlerImpl
could take a dir handler as a constructor argument and change to a static class so we can
instantiate it and test it directly rather than needing to bring up the entire container manager
for this test.

> ShuffleHandler is not aware of disks that are added
> ---------------------------------------------------
>                 Key: YARN-7244
>                 URL: https://issues.apache.org/jira/browse/YARN-7244
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: YARN-7244.001.patch, YARN-7244.002.patch, YARN-7244.003.patch, YARN-7244.004.patch,
> The ShuffleHandler permanently remembers the list of "good" disks on NM startup. If disks
later are added to the node then map tasks will start using them but the ShuffleHandler will
not be aware of them. The end result is that the data cannot be shuffled from the node leading
to fetch failures and re-runs of the map tasks.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message