drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DRILL-5157) Multiple unit tests fail with Parquet async Snappy error
Date Fri, 23 Dec 2016 21:44:58 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15773588#comment-15773588
] 

Paul Rogers edited comment on DRILL-5157 at 12/23/16 9:44 PM:
--------------------------------------------------------------

Problem also occurs in {{TestConvertFunctions#testConvertFromConvertToInt}} when run on a
Mac.

Fails when running this query (with correct parameters):

{code}
CREATE TABLE %s.%s as
          SELECT convert_to(r_regionkey, 'INT') as ct
          FROM cp.`tpch/region.parquet`
{code}

When this query fails, we encounter an {{IllegalStateException}} error in a WorkerBee thread
(See DRILL-5156.)

The error *does not* occur if the test is run individually, only when run as part of the suite.
This suggests that the problem is more subtle: effect of a prior query causing a situation
that later shows up as the Snappy problem.

Disabled all tests except {{testConvertFromConvertToInt}} by inserting a (temporary) {{@Ignore}}
for each test. The test still fails. The only two queries run before the one cited above are:

{code}
ALTER SESSION SET `exec.errors.verbose` = true
{code}

(Called in {{BaseTestQuery.setupDefaultTestCluster()}} in {{@Before}} block.) And:

{code}
alter session set `planner.slice_target` = 1
{code}

Called in the test itself.

Also, the test does the following:

{code}
    final OptionValue srOption = setupScalarReplacementOption(bits[0], ScalarReplacementOption.OFF);
{code}

Commenting out the second and third items above does not, however, resolve the issue.

The only difference between the command line for the two cases is the instruction to JUnit:

{code}
17,18c17,18
< -test
< org.apache.drill.exec.physical.impl.TestConvertFunctions:testConvertFromConvertToInt
---
> -classNames
> org.apache.drill.exec.physical.impl.TestConvertFunctions
{code}

FWIW: the test is run under Java 8, not the Java 7 that is most often used, in case that makes
a difference.

Digging deeper: the {{FragmentContext}} that triggers the error is created for the CTAS operation.
The CTAS fails due to the Snappy issue. Later, the {{FragmentContext.close()}} method attempts
to close each {{OperatorContextImpl}}. The one that corresponds to {{org.apache.drill.exec.store.parquet.ParquetRowGroupScan}}
fails when closing its allocator. 

The particular failure is in {{BaseAllocator}}, line 483:

{code}
        // are there outstanding buffers?
        final int allocatedCount = childLedgers.size();
        if (allocatedCount > 0) {
          throw new IllegalStateException(
              String.format("Allocator[%s] closed with outstanding buffers allocated (%d).\n%s",
                  name, allocatedCount, toString()));
        }
{code}

So, this is failing because the Parquet group scan has a memory leak, which we already knew.


was (Author: paul-rogers):
Problem also occurs in {{TestConvertFunctions#testConvertFromConvertToInt}} when run on a
Mac.

Fails when running this query (with correct parameters):

{code}
CREATE TABLE %s.%s as
          SELECT convert_to(r_regionkey, 'INT') as ct
          FROM cp.`tpch/region.parquet`
{code}

When this query fails, we encounter an {{IllegalStateException}} error in a WorkerBee thread
(See DRILL-5156.)

The error *does not* occur if the test is run individually, only when run as part of the suite.
This suggests that the problem is more subtle: effect of a prior query causing a situation
that later shows up as the Snappy problem.

Disabled all tests except {{testConvertFromConvertToInt}} by inserting a (temporary) {{@Ignore}}
for each test. The test still fails. The only two queries run before the one cited above are:

{code}
ALTER SESSION SET `exec.errors.verbose` = true
{code}

(Called in {{BaseTestQuery.setupDefaultTestCluster()}} in {{@Before}} block.) And:

{code}
alter session set `planner.slice_target` = 1
{code}

Called in the test itself.

Also, the test does the following:

{code}
    final OptionValue srOption = setupScalarReplacementOption(bits[0], ScalarReplacementOption.OFF);
{code}

Commenting out the second and third items above does not, however, resolve the issue.

The only difference between the command line for the two cases is the instruction to JUnit:

{code}
17,18c17,18
< -test
< org.apache.drill.exec.physical.impl.TestConvertFunctions:testConvertFromConvertToInt
---
> -classNames
> org.apache.drill.exec.physical.impl.TestConvertFunctions
{code}

FWIW: the test is run under Java 8, not the Java 7 that is most often used, in case that makes
a difference.

Digging deeper: the {{FragmentContext}} that triggers the error is created for the CTAS operation.
The CTAS fails due to the Snappy issue. Later, the {{FragmentContext.close()}} method attempts
to close each {{OperatorContextImpl}}. The one that corresponds to {{org.apache.drill.exec.store.parquet.ParquetRowGroupScan}}
fails when closing its allocator. 

The particular failure is in {{BaseAllocator}}, line 483:

{code}
        // are there outstanding buffers?
        final int allocatedCount = childLedgers.size();
        if (allocatedCount > 0) {
          throw new IllegalStateException(
              String.format("Allocator[%s] closed with outstanding buffers allocated (%d).\n%s",
                  name, allocatedCount, toString()));
        }
{code}

So, this is failing because the Parquet group scan has a memory leak.

> Multiple unit tests fail with Parquet async Snappy error
> --------------------------------------------------------
>
>                 Key: DRILL-5157
>                 URL: https://issues.apache.org/jira/browse/DRILL-5157
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Assignee: Parth Chandra
>
> Run the {{TestDrillbitResilience.doMemoryLeaksWhenCancelled}} unit test. It fails with
the following stack trace and the memory leak trace shown second.
> Strangely, this error appears only if the test is run as part of the overall suite. The
error does not appear if the test is run individually in the debugger. This suggests that
the problem described here is a side-effect of a problem created by an earlier test.
> Stack trace that seems to show that the code was trying to find a Snappy native library:
> {code}
> java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
> 	at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
> 	at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$DecompressionHelper.decompress(AsyncPageReader.java:402)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.decompress(AsyncPageReader.java:169)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.getDecompressedPageData(AsyncPageReader.java:96)
> 	at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.nextInternal(AsyncPageReader.java:219)
> 	at org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:280)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readPage(ColumnReader.java:250)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:178)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:130)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFieldsSerial(ParquetRecordReader.java:485)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:479)
> 	at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:562)
> 	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:178)
> 	at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
> 	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> …
> {code}
> Resulting memory leak if the test is allowed to complete:
> {code}
> java.lang.AssertionError: Query state is incorrect (expected: CANCELED, actual: FAILED)
AND/OR 
> Exception thrown: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
IllegalStateException: Allocator[op:3:0:15:ParquetRowGroupScan] closed with outstanding buffers
allocated (2).
> Allocator(op:3:0:15:ParquetRowGroupScan) 1000000/393216/3162112/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 2
>     ledger[3407] allocator: op:3:0:15:ParquetRowGroupScan), isOwning: true, size: 131072,
references: 1, life: 73148192887288..0, allocatorManager: [3027, life: 73148192235794..0]
holds 1 buffers. 
>         DrillBuf[4949], udle: [3028 0..131072]
>     ledger[3471] allocator: op:3:0:15:ParquetRowGroupScan), isOwning: true, size: 262144,
references: 1, life: 73148451288840..0, allocatorManager: [3091, life: 73148451257480..0]
holds 1 buffers. 
>         DrillBuf[5017], udle: [3092 0..262144]
>   reservations: 0
> Fragment 3:0
> [Error Id: 8502074b-f488-4a14-bf7d-a2a4480392cd on 172.30.1.67:31016]
> 	at org.apache.drill.exec.server.TestDrillbitResilience.assertStateCompleted(TestDrillbitResilience.java:861)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.assertCancelledWithoutException(TestDrillbitResilience.java:876)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.doMemoryLeaksWhenCancelled(TestDrillbitResilience.java:680)
> 	at org.apache.drill.exec.server.TestDrillbitResilience.memoryLeaksWhenCancelled(TestDrillbitResilience.java:647)
> 	...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message