drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5157) Multiple unit tests fail with Parquet async Snappy error
Date Fri, 23 Dec 2016 22:29:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773753#comment-15773753
] 

Paul Rogers edited comment on DRILL-5157 at 12/23/16 10:29 PM:
---------------------------------------------------------------

Back to the original issue, the following line in {{AsyncPageReader}} is the scene of the
crime:

{code}
        int size = Snappy.uncompress(input, output);
{code}

When the test is run individually, the line is hit once and works fine. When called as part
of the suite, the line is hit once and fails.

Similar issue found by others: https://github.com/ptaoussanis/carmine/issues/5

Also: http://stackoverflow.com/questions/30039976/unsatisfiedlinkerror-no-snappyjava-in-java-library-path-when-running-spark-mlli

Turns out we are running a vintage version of Snappy:

{code}
    <dependency>
      <groupId>org.xerial.snappy</groupId>
      <artifactId>snappy-java</artifactId>
      <version>1.0.5-M3</version>
    </dependency>
{code}

>From Maven central:
{code}
1.0.5-M3	(Sep, 2012)
{code}

Upgrading to 1.0.5, as suggested in the StackOverflow post above, did not help.

Let's look at the class path more carefully and we begin to see the problem:

{code}
org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar
org/xerial/snappy/snappy-java/1.0.5/snappy-java-1.0.5.jar
{code}

So, even though I upgraded the Snappy used by Java-exec to 1.0.5 (as suggested by the post),
we have other subproject including other Snappy versions. In fact, we have many more references:

{code}
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/com.twitter/parquet-hadoop/pom.xml:
      <version>1.1.1.6</version>
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/com.twitter/parquet-hadoop/pom.xml:
    (Version unspecified, inherited from parent)
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/org.apache.avro/avro/pom.xml:
    (Version unspecified, inherited from parent)
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/org.apache.hive/hive-exec/pom.xml
    <groupId>org.iq80.snappy</groupId> (Java implementation)
drill/exec/java-exec/pom.xml:
      <version>1.0.5-M3</version>
{code}

So, we are in Maven dependency hell here.


was (Author: paul-rogers):
Back to the original issue, the following line in {{AsyncPageReader}} is the scene of the
crime:

{code}
        int size = Snappy.uncompress(input, output);
{code}

When the test is run individually, the line is hit once and works fine. When called as part
of the suite, the line is hit once and fails.

Similar issue found by others: https://github.com/ptaoussanis/carmine/issues/5

Also: http://stackoverflow.com/questions/30039976/unsatisfiedlinkerror-no-snappyjava-in-java-library-path-when-running-spark-mlli

Turns out we are running a vintage version of Snappy:

{code}
    <dependency>
      <groupId>org.xerial.snappy</groupId>
      <artifactId>snappy-java</artifactId>
      <version>1.0.5-M3</version>
    </dependency>
{code}

>From Maven central:
{code}
1.0.5-M3	(Sep, 2012)
{code}

Upgrading to 1.0.5, as suggested in the StackOverflow post above, did not help.

Let's look at the class path more carefully and we begin to see the problem:

{code}
org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar
org/xerial/snappy/snappy-java/1.0.5/snappy-java-1.0.5.jar
{code}

So, even though I upgraded the Snappy used by Java-exec to 1.0.5 (as suggested by the post),
we have other subproject including other Snappy versions. In fact, we have many more references:

{code}
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/com.twitter/parquet-hadoop/pom.xml:
      <version>1.1.1.6</version>
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/com.twitter/parquet-hadoop/pom.xml:
    (Version unspecified, inherited from parent)
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/org.apache.avro/avro/pom.xml:
    (Version unspecified, inherited from parent)
drill/contrib/storage-hive/hive-exec-shade/target/classes/META-INF/maven/org.apache.hive/hive-exec/pom.xml
    <groupId>org.iq80.snappy</groupId> (Java implementation)
drill/exec/java-exec/pom.xml:
      <version>1.1.2.6</version>
{code}

So, we are in Maven dependency hell here.

> Multiple unit tests fail with Parquet async Snappy error
> --------------------------------------------------------
>
>                 Key: DRILL-5157
>                 URL: https://issues.apache.org/jira/browse/DRILL-5157
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Parth Chandra
>
> Run the {{TestDrillbitResilience.doMemoryLeaksWhenCancelled}} unit test. It fails with
the following stack trace and the memory leak trace shown second.
> Strangely, this error appears only if the test is run as part of the overall suite. The
error does not appear if the test is run individually in the debugger. This suggests that
the problem described here is a side-effect of a problem created by an earlier test.
> Stack trace that seems to show that the code was trying to find a Snappy native library:
> {code}
> java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
> 	at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
> 	at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.decompress(AsyncPageReader.java:169)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.getDecompressedPageData(AsyncPageReader.java:96)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.nextInternal(AsyncPageReader.java:219)
> 	at org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:280)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readPage(ColumnReader.java:250)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:178)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:130)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFieldsSerial(ParquetRecordReader.java:485)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:479)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:562)
> 	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:178)
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> …
> {code}
> Resulting memory leak if the test is allowed to complete:
> {code}
> java.lang.AssertionError: Query state is incorrect (expected: CANCELED, actual: FAILED)
AND/OR 
> Exception thrown: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: Allocator[op:3:0:15:ParquetRowGroupScan] closed with outstanding buffers
allocated (2).
> Allocator(op:3:0:15:ParquetRowGroupScan) 1000000/393216/3162112/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 2
>     ledger[3407] allocator: op:3:0:15:ParquetRowGroupScan), isOwning: true, size: 131072,
references: 1, life: 73148192887288..0, allocatorManager: [3027, life: 73148192235794..0]
holds 1 buffers. 
>         DrillBuf[4949], udle: [3028 0..131072]
>     ledger[3471] allocator: op:3:0:15:ParquetRowGroupScan), isOwning: true, size: 262144,
references: 1, life: 73148451288840..0, allocatorManager: [3091, life: 73148451257480..0]
holds 1 buffers. 
>         DrillBuf[5017], udle: [3092 0..262144]
>   reservations: 0
> Fragment 3:0
> [Error Id: 8502074b-f488-4a14-bf7d-a2a4480392cd on 172.30.1.67:31016]
> 	at org.apache.drill.exec.server.TestDrillbitResilience.assertStateCompleted(TestDrillbitResilience.java:861)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.assertCancelledWithoutException(TestDrillbitResilience.java:876)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.doMemoryLeaksWhenCancelled(TestDrillbitResilience.java:680)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.memoryLeaksWhenCancelled(TestDrillbitResilience.java:647)
> 	...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message