[ https://issues.apache.org/jira/browse/DRILL-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053095#comment-16053095
]
Paul Rogers commented on DRILL-5590:
------------------------------------
Thanks [~khfaraaz] for explaining the issue. Looks like the fix works in the reader, but Drill
could do a better job of handling malformed UTF-8 characters. The classic answer is to replace
the malformed character with a placeholder such as ◻︎. In a big data system such as Drill,
failing the entire query is probably not terribly helpful...
> Drill return IndexOutOfBoundsException when a (Text) file > 4096 rows
> ---------------------------------------------------------------------
>
> Key: DRILL-5590
> URL: https://issues.apache.org/jira/browse/DRILL-5590
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.10.0
> Environment: OS: Oracle Linux Enterprise 7, OSX 10.10.1
> JVM: 1.8
> Drill Installation type: Embebed or distributed(Cluster 2 Nodes)
> Reporter: Victor Garcia
> Assignee: Paul Rogers
> Attachments: xaa_19.txt
>
>
> I describe below, the storage (name lco):
> {
> "type": "file",
> "enabled": true,
> "connection": "file:///",
> "config": null,
> "workspaces": {
> "root": {
> "location": "/data/source/lco",
> "writable": false,
> "defaultInputFormat": "psv"
> }
> },
> "formats": {
> "psv": {
> "type": "text",
> "extensions": [
> "txt"
> ],
> "extractHeader": true,
> "delimiter": "|"
> }
> }
> }
> Querying a CSV file with 3 columns and when the file have > 4096 (including the header),
Drill return a error, but when i reduce the rows to 4095 rows the query work.
> Query used: (Select count(1) from lco.root.* as lc where lc.rfc like 'CUBA7706%')
> The original file have 35M of rows, but i test reducing the rows until that find the
number of rows that produce the error.
> The original source file is in this URL (http://cfdisat.blob.core.windows.net/lco/l_RFC_2017_05_11_2.txt.gz)
> First part of error:
> at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
[drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:343) [drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:88) [drill-java-exec-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) [drill-rpc-1.10.0.jar:1.10.0]
> at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:244) [drill-rpc-1.10.0.jar:1.10.0]
> at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
[netty-handler-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
[netty-codec-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
[netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
[netty-common-4.0.27.Final.jar:4.0.27.Final]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> 2017-06-15 14:45:03,056 [qtp2036240117-58] ERROR o.a.d.e.server.rest.QueryResources -
Query from Web UI Failed
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException:
index: 16384, length: 4 (expected: range(0, 16384))
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
|