hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zsolt Venczel (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-13121) NPE when request file descriptors when SC read
Date Sat, 02 Jun 2018 20:52:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16499170#comment-16499170
] 

Zsolt Venczel edited comment on HDFS-13121 at 6/2/18 8:51 PM:
--------------------------------------------------------------

To make sure the test is meaningful I did the following:

1) built the entire hadoop project with the native profile activated as the added test is
triggered with loaded native library only:
{code:java}
mvn clean install -Pnative -DskipTests
{code}
2) Applied the patch having the test only: [^test-only.patch]

3) Run the test in the hadoop-hdfs-project/hadoop-hdfs module:
{code:java}
mvn test -Dtest=TestShortCircuitCache
{code}
The test fails with
{code:java}
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 11.004 s <<<
FAILURE! - in org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[ERROR] testRequestFileDescriptorsWhenULimit(org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache)
Time elapsed: 0.397 s <<< FAILURE!
java.lang.AssertionError: Should not throw NPE when the native library is unable to create
new files!
at org.junit.Assert.fail(Assert.java:88)
at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testRequestFileDescriptorsWhenULimit(TestShortCircuitCache.java:907)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR] TestShortCircuitCache.testRequestFileDescriptorsWhenULimit:907 Should not throw NPE
when the native library is unable to create new files!
[INFO] 
[ERROR] Tests run: 13, Failures: 1, Errors: 0, Skipped: 0
[INFO] 

{code}
4) I reset my branch to HEAD and apply [^HDFS-13121.03.patch].
 Running the test again results with:
{code:java}
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.268 s - in org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}


was (Author: zvenczel):
To make sure the test is meaningful I did the following:

1) built the entire hadoop project with the native profile activated as the added test is
triggered with loaded native library only:
{code:java}
mvn clean install -Pnative -DskipTests
{code}
2) Applied the patch having the test only: [^test-only.patch]

3) Run the test in the hadoop-hdfs-project/hadoop-hdfs module:
{code:java}
mvn test -Dtest=TestShortCircuitCache
{code}
The test fails with
{code:java}
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[ERROR] Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 9.842 s <<<
FAILURE! - in org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[ERROR] testRequestFileDescriptorsWhenULimit(org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache)
 Time elapsed: 0.355 s  <<< ERROR!
java.lang.NullPointerException
	at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:129)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:620)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:553)
	at org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testRequestFileDescriptorsWhenULimit(TestShortCircuitCache.java:903)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   TestShortCircuitCache.testRequestFileDescriptorsWhenULimit:903 » NullPointer
[INFO] 
[ERROR] Tests run: 13, Failures: 0, Errors: 1, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16.245 s
[INFO] Finished at: 2018-06-02T21:56:14+02:00
[INFO] Final Memory: 44M/711M
{code}
4) I reset my branch to HEAD and apply [^HDFS-13121.02.patch].
 Running the test again results with:
{code:java}
[INFO] 
[INFO] --- maven-surefire-plugin:2.21.0:test (default-test) @ hadoop-hdfs ---
[INFO] 
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.268 s - in org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
{code}

> NPE when request file descriptors when SC read
> ----------------------------------------------
>
>                 Key: HDFS-13121
>                 URL: https://issues.apache.org/jira/browse/HDFS-13121
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Gang Xie
>            Assignee: Zsolt Venczel
>            Priority: Minor
>         Attachments: HDFS-13121.01.patch, HDFS-13121.02.patch, HDFS-13121.03.patch, test-only.patch
>
>
> Recently, we hit an issue that the DFSClient throws NPE. The case is that, the app process
exceeds the limit of the max open file. In the case, the libhadoop never throw and exception
but return null to the request of fds. But requestFileDescriptors use the returned fds directly
without any check and then NPE. 
>  
> We need add a sanity check here of null pointer.
>  
> private ShortCircuitReplicaInfo requestFileDescriptors(DomainPeer peer,
>  Slot slot) throws IOException {
>  ShortCircuitCache cache = clientContext.getShortCircuitCache();
>  final DataOutputStream out =
>  new DataOutputStream(new BufferedOutputStream(peer.getOutputStream()));
>  SlotId slotId = slot == null ? null : slot.getSlotId();
>  new Sender(out).requestShortCircuitFds(block, token, slotId, 1,
>  failureInjector.getSupportsReceiptVerification());
>  DataInputStream in = new DataInputStream(peer.getInputStream());
>  BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(
>  PBHelperClient.vintPrefixed(in));
>  DomainSocket sock = peer.getDomainSocket();
>  failureInjector.injectRequestFileDescriptorsFailure();
>  switch (resp.getStatus()) {
>  case SUCCESS:
>  byte buf[] = new byte[1];
>  FileInputStream[] fis = new FileInputStream[2];
>  {color:#d04437}sock.recvFileInputStreams(fis, buf, 0, buf.length);{color}
>  ShortCircuitReplica replica = null;
>  try {
>  ExtendedBlockId key =
>  new ExtendedBlockId(block.getBlockId(), block.getBlockPoolId());
>  if (buf[0] == USE_RECEIPT_VERIFICATION.getNumber()) {
>  LOG.trace("Sending receipt verification byte for slot {}", slot);
>  sock.getOutputStream().write(0);
>  }
>  {color:#d04437}replica = new ShortCircuitReplica(key, fis[0], fis[1], cache,{color}
> {color:#d04437} Time.monotonicNow(), slot);{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message