Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2A635200C6D for ; Sun, 7 May 2017 09:00:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 290D5160B9A; Sun, 7 May 2017 07:00:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1F49A160BB1 for ; Sun, 7 May 2017 09:00:16 +0200 (CEST) Received: (qmail 28575 invoked by uid 500); 7 May 2017 07:00:16 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 28566 invoked by uid 99); 7 May 2017 07:00:16 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 07 May 2017 07:00:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D969C180535 for ; Sun, 7 May 2017 07:00:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id WkcFCL1HLYTQ for ; Sun, 7 May 2017 07:00:13 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id B57DD5FCFA for ; Sun, 7 May 2017 07:00:12 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id B7626E09BC for ; Sun, 7 May 2017 07:00:11 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9C66921DED for ; Sun, 7 May 2017 07:00:10 +0000 (UTC) Date: Sun, 7 May 2017 07:00:10 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-5470) CSV reader data corruption on truncated lines MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 07 May 2017 07:00:18 -0000 [ https://issues.apache.org/jira/browse/DRILL-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999712#comment-15999712 ] Paul Rogers edited comment on DRILL-5470 at 5/7/17 6:59 AM: ------------------------------------------------------------ To illustrate the CSV data corruption, I created a CSV file, test4.csv, of the following form: {code} h,u abc,def ghi {code} Then, I created a simple test using the "cluster fixture" framework: {code} @Test public void readerTest() throws Exception { FixtureBuilder builder = ClusterFixture.builder() .maxParallelization(1); try (ClusterFixture cluster = builder.build(); ClientFixture client = cluster.clientFixture()) { TextFormatConfig csvFormat = new TextFormatConfig(); csvFormat.fieldDelimiter = ','; csvFormat.skipFirstLine = false; csvFormat.extractHeader = true; cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); String sql = "SELECT * FROM `dfs.data`.`csv/test4.csv` LIMIT 10"; client.queryBuilder().sql(sql).printCsv(); } } {code} The results show we've got a problem: {code} Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: length: -3 (expected: >= 0) {code} If the last line were: {code} efg, {code} Then the offset vector should look like this: {code} [0, 3, 3] {code} Very likely we have an offset vector that looks like this instead: {code} [0, 3, 0] {code} When we compute the second column of the second row, we should compute: {code} length = offset[2] - offset[1] = 3 - 3 = 0 {code} Instead we get: {code} length = offset[2] - offset[1] = 0 - 3 = -3 {code} Somehow, in the user's scenario, the number are far larger and the value has wrapped around to the bogus length shown. The summary is that a premature EOF appears to cause the "missing" columns to be skipped; they are not filled with a blank value to "bump" the offset vectors to fill in the last row. Instead, they are left at 0, causing havoc downstream in the query. was (Author: paul-rogers): To illustrate the CSV data corruption, I created a CSV file, test4.csv, of the following form: {code} h,u abc,def ghi {code} Then, I created a simple test using the "cluster fixture" framework: {code} @Test public void readerTest() throws Exception { FixtureBuilder builder = ClusterFixture.builder() .maxParallelization(1); try (ClusterFixture cluster = builder.build(); ClientFixture client = cluster.clientFixture()) { TextFormatConfig csvFormat = new TextFormatConfig(); csvFormat.fieldDelimiter = ','; csvFormat.skipFirstLine = false; csvFormat.extractHeader = true; cluster.defineWorkspace("dfs", "data", "/tmp/data", "csv", csvFormat); String sql = "SELECT * FROM `dfs.data`.`csv/test4.csv` LIMIT 10"; client.queryBuilder().sql(sql).printCsv(); } } {code} The results show we've got a problem: {code} Exception (no rows returned): org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: length: -3 (expected: >= 0) {code} If the last line were: {code} efg, {code} Then the offset vector should look like this: {code} \[0, 3, 3] {code} Very likely we have an offset vector that looks like this instead: {code} \[0, 3, 0] {code} When we compute the second column of the second row, we should compute: {code} length = offset\[2] - offset\[1] = 3 - 3 = 0 {code} Instead we get: {code} length = offset\[2] - offset\[1] = 0 - 3 = -3 {code} Somehow, in the user's scenario, the number are far larger and the value has wrapped around to the bogus length shown. The summary is that a premature EOF appears to cause the "missing" columns to be skipped; they are not filled with a blank value to "bump" the offset vectors to fill in the last row. Instead, they are left at 0, causing havoc downstream in the query. > CSV reader data corruption on truncated lines > --------------------------------------------- > > Key: DRILL-5470 > URL: https://issues.apache.org/jira/browse/DRILL-5470 > Project: Apache Drill > Issue Type: Bug > Components: Server > Affects Versions: 1.10.0 > Environment: - ubuntu 14.04 > - r3.8xl (32 CPU/240GB Mem) > - openjdk version "1.8.0_111" > - drill 1.10.0 with 8656c83b00f8ab09fb6817e4e9943b2211772541 cherry-picked > Reporter: Nathan Butler > Assignee: Paul Rogers > > Per the mailing list discussion and Rahul's and Paul's suggestion I'm filing this Jira issue. Drill seems to be running out of memory when doing an External Sort. Per Zelaine's suggestion I enabled sort.external.disable_managed in drill-override.conf and in the sqlline session. This caused the query to run for longer but it still would fail with the same message. > Per Paul's suggestion, I enabled debug logging for the org.apache.drill.exec.physical.impl.xsort.managed package and re-ran the query. > Here's the initial DEBUG line for ExternalSortBatch for our query: > bq. 2017-05-03 12:02:56,095 [26f600f1-17b3-d649-51be-2ca0c9bf7606:frag:2:15] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Config: memory limit = 10737418240, spill file size = 268435456, spill batch size = 8388608, merge limit = 2147483647, merge batch size = 16777216 > And here's the last DEBUG line before the stack trace: > bq. 2017-05-03 12:37:44,249 [26f600f1-17b3-d649-51be-2ca0c9bf7606:frag:2:4] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 10737418240, buffer memory = 10719535268, merge memory = 10707140978 > And the stacktrace: > {quote} > 2017-05-03 12:38:02,927 [26f600f1-17b3-d649-51be-2ca0c9bf7606:frag:2:6] INFO o.a.d.e.p.i.x.m.ExternalSortBatch - User Error Occurred: External Sort encountered an error while spilling to disk (Un > able to allocate buffer of size 268435456 due to memory limit. Current allocation: 10579849472) > org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External Sort encountered an error while spilling to disk > [Error Id: 5d53c677-0cd9-4c01-a664-c02089670a1c ] > at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1447) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:1376) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.spillFromMemory(ExternalSortBatch.java:1339) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.processBatch(ExternalSortBatch.java:831) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.loadBatch(ExternalSortBatch.java:618) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.load(ExternalSortBatch.java:660) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.innerNext(ExternalSortBatch.java:559) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:137) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226) [drill-java-exec-1.10.0.jar:1.10.0] > at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_111] > at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_111] > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) [hadoop-common-2.7.1.jar:na] > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226) [drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.10.0.jar:1.10.0] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111] > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 268435456 due to memory limit. Current allocation: 10579849472 > at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:220) ~[drill-memory-base-1.10.0.jar:1.10.0] > at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:195) ~[drill-memory-base-1.10.0.jar:1.10.0] > at org.apache.drill.exec.vector.VarCharVector.reAlloc(VarCharVector.java:425) ~[vector-1.10.0.jar:1.10.0] > at org.apache.drill.exec.vector.VarCharVector.copyFromSafe(VarCharVector.java:278) ~[vector-1.10.0.jar:1.10.0] > at org.apache.drill.exec.vector.NullableVarCharVector.copyFromSafe(NullableVarCharVector.java:379) ~[vector-1.10.0.jar:1.10.0] > at org.apache.drill.exec.test.generated.PriorityQueueCopierGen140.doCopy(PriorityQueueCopierTemplate.java:22) ~[na:na] > at org.apache.drill.exec.test.generated.PriorityQueueCopierGen140.next(PriorityQueueCopierTemplate.java:76) ~[na:na] > at org.apache.drill.exec.physical.impl.xsort.managed.CopierHolder$BatchMerger.next(CopierHolder.java:234) ~[drill-java-exec-1.10.0.jar:1.10.0] > at org.apache.drill.exec.physical.impl.xsort.managed.ExternalSortBatch.doMergeAndSpill(ExternalSortBatch.java:1408) [drill-java-exec-1.10.0.jar:1.10.0] > ... 24 common frames omitted > {quote} > I'm in communication with Paul and will send him the full log file. > Thanks, > Nathan -- This message was sent by Atlassian JIRA (v6.3.15#6346)