Mailing-List: contact issues-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
Date: Thu, 13 Apr 2017 14:38:42 +0000 (UTC)
From: "Zelaine Fong (JIRA)" <jira@apache.org>
To: issues@drill.apache.org
Message-ID: <JIRA.13063789.1492091648000.276198.1492094322019@Atlassian.JIRA>
In-Reply-To: <JIRA.13063789.1492091648000@Atlassian.JIRA>
References: <JIRA.13063789.1492091648000@Atlassian.JIRA> <JIRA.13063789.1492091648581@jira-lw-us.apache.org>
Subject: [jira] [Assigned] (DRILL-5435) Using Limit causes Memory Leaked
 Error since 1.10
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Thu, 13 Apr 2017 14:38:47 -0000


     [ https://issues.apache.org/jira/browse/DRILL-5435?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zelaine Fong reassigned DRILL-5435:
-----------------------------------

    Assignee: Parth Chandra

> Using Limit causes Memory Leaked Error since 1.10
> -------------------------------------------------
>
>                 Key: DRILL-5435
>                 URL: https://issues.apache.org/jira/browse/DRILL-5435
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.10.0
>            Reporter: F M=C3=A9thot
>            Assignee: Parth Chandra
>
> Here is the details I can provide:
>   We migrated our production system from Drill 1.9 to 1.10 just 5 days ag=
o. (220 nodes cluster)
> Our log show there was some 900+ queries ran without problem in first 4 d=
ays.  (similar queries, that never use the `limit` clause)
> Yesterday we started doing simple adhoc select * ... limit 10 queries (li=
ke we often do, that was our first use of limit with 1.10)
> and we got a `Memory was leaked` exception below.
> Also, once we get the error, Most of all subsequent user queries fails wi=
th Channel Close Exception. We need to restart Drill to bring it back to no=
rmal.
> A day later, I used a similar select * limit 10 queries, and the same thi=
ng happen, had to restart Drill.
> In the exception, it was refering to a file (1_0_0.parquet)
> I moved that file to smaller test cluster (12 nodes) and got the error on=
 the first attempt. but I am no longer able to reproduce the issue on that =
file. Between the 12 and 220 nodes cluster, a different Column name and Row=
 Group Start was listed in the error.
> The parquet file was generated by Drill 1.10.
> I tried the same file with a local drill-embedded 1.9 and 1.10 and had no=
 issue.
> Here is the error (manually typed), if you think of anything obvious, let=
 us know.
> AsyncPageReader - User Error Occured: Exception Occurred while reading fr=
om disk (can not read class o.a.parquet.format.PageHeader: java.io.IOExcept=
ion: input stream is closed.)
> File:..../1_0_0.parquet
> Column: StringColXYZ
> Row Group Start: 115215476
> [Error Id: ....]
>   at UserException.java:544)
>   at o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThro=
wException(AsynvPageReader.java:199)
>   at o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.access(AsynvP=
ageReader.java:81)
>   at o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageRead=
erTask.call(AsyncPageReader.java:483)
>   at o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageRead=
erTask.call(AsyncPageReader.java:392)
>   at o.a.d.exec.store.parquet.columnreaders.AsyncPageReader.AsyncPageRead=
erTask.call(AsyncPageReader.java:392)
> ...
> Caused by: java.io.IOException: can not read class org.apache.parquet.for=
mat.PageHeader: java.io.IOException: Input Stream is closed.
>    at o.a.parquet.format.Util.read(Util.java:216)
>    at o.a.parquet.format.Util.readPageHeader(Util.java:65)
>    at o.a.drill.exec.store.parquet.columnreaders.AsyncPageReader(AsyncPag=
eReaderTask:430)
> Caused by: parquet.org.apache.thrift.transport.TTransportException: Input=
 stream is closed
>    at ...read(TIOStreamTransport.java:129)
>    at ....TTransport.readAll(TTransport.java:84)
>    at ....TCompactProtocol.readByte(TCompactProtocol.java:474)
>    at ....TCompactProtocol.readFieldBegin(TCompactProtocol.java:481)
>    at ....InterningProtocol.readFieldBegin(InterningProtocol.java:158)
>    at ....o.a.parquet.format.PageHeader.read(PageHeader.java:828)
>    at ....o.a.parquet.format.Util.read(Util.java:213)
> Fragment 0:0
> [Error id: ...]
> o.a.drill.common.exception.UserException: SYSTEM ERROR: IllegalStateExcep=
tion: Memory was leaked by query. Memory leaked: (524288)
> Allocator(op:0:0:4:ParquetRowGroupScan) 1000000/524288/39919616/100000000=
00
>   at o.a.d.common.exceptions.UserException (UserException.java:544)
>   at o.a.d.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExe=
cutor.java:293)
>   at o.a.d.exec.work.fragment.FragmentExecutor.cleanup( FragmentExecutor.=
java:160)
>   at o.a.d.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:=
262)
> ...
> Caused by: IllegalStateException: Memory was leaked by query. Memory leak=
ed: (524288)
>   at o.a.d.exec.memory.BaseAllocator.close(BaseAllocator.java:502)
>   at o.a.d.exec.ops.OperatorContextImpl(OperatorContextImpl.java:149)
>   at o.a.d.exec.ops.FragmentContext.suppressingClose(FragmentContext.java=
:422)
>   at o.a.d.exec.ops.FragmentContext.close(FragmentContext.java:411)
>   at o.a.d.exec.work.fragment.FragmentExecutor.closeOutResources(Fragment=
Executor.java:318)
>   at o.a.d.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.j=
ava:155)
> This fixed the problem:
> alter <session|system> set `store.parquet.reader.pagereader.async`=3Dfals=
e;


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)