drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Garcia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5591) non-ASCII characters in text file result in MalformedInputException
Date Fri, 30 Jun 2017 20:34:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070669#comment-16070669
] 

Victor Garcia commented on DRILL-5591:
--------------------------------------

I resolved the problem temporally using VIM editor:

vim +"set nobomb | set fenc=utf8 | x" filename.txt

Explanation part!

+ : Used by vim to directly enter command when opening a file. Usualy used to open a file
at a specific line: vim +14 file.txt
| : Separator of multiple commands (like ; in bash)
set nobomb : no utf-8 BOM
set fenc=utf8 : Set new encoding to utf-8 doc link
x : Save and close file
filename.txt : path to the file
" : qotes are here because of pipes. (otherwise bash will use them as bash pipe)

The command and the explaniation was taked from:

[https://stackoverflow.com/questions/64860/best-way-to-convert-text-files-between-character-sets]

Thank you Paul.

> non-ASCII characters in text file result in MalformedInputException
> -------------------------------------------------------------------
>
>                 Key: DRILL-5591
>                 URL: https://issues.apache.org/jira/browse/DRILL-5591
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.11.0
>            Reporter: Khurram Faraaz
>
> I am on Drill 1.11.0 commit id: 874bf62
> To repro the issue:
> wget http://cfdisat.blob.core.windows.net/lco/l_RFC_2017_05_11_2.txt.gz
> gunzip l_RFC_2017_05_11_2.txt.gz
> hadoop fs -put l_RFC_2017_05_11_2.txt /tmp
> There are some non-ASCII characters at the beginning and end of the file used in the
test.
> {noformat}
> [root@centos-01 drill_5590]# head l_RFC_2017_05_11_2.txt
> ����0���1��
> ��� ��RFC|SNCF|SUBCONTRATACION
> CUBB910321AC1|0|0
> CUBB9104187K9|0|0
> CUBB910709KD0|0|0
> CUBB910817CE8|0|0
> CUBB9111286YA|0|0
> CUBB920408J69|0|0
> {noformat}
> Failing query
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(1) from `l_RFC_2017_05_11_2.txt` t where
columns[0] like 'CUBA7706%';
> Error: SYSTEM ERROR: MalformedInputException: Input length = 1
> Fragment 0:0
> [Error Id: cdfa704c-0bc8-4791-95ae-d05b4c63beab on centos-01.qa.lab:31010] (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input
length = 1
>         at org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper.decodeUT8(CharSequenceWrapper.java:185)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper.setBuffer(CharSequenceWrapper.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.test.generated.FiltererGen15.doEval(FilterTemplate2.java:50)
~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen15.filterBatchNoSV(FilterTemplate2.java:100)
~[na:na]
>         at org.apache.drill.exec.test.generated.FiltererGen15.filterBatch(FilterTemplate2.java:73)
~[na:na]
>         at org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.doWork(FilterRecordBatch.java:81)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:93)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:102)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema(StreamingAggBatch.java:102)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:142)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:225)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_91]
>         at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_91]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
~[hadoop-common-2.7.0-mapr-1607.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 4 common frames omitted
> Caused by: java.nio.charset.MalformedInputException: Input length = 1
>         at java.nio.charset.CoderResult.throwException(CoderResult.java:281) ~[na:1.8.0_91]
>         at org.apache.drill.exec.expr.fn.impl.CharSequenceWrapper.decodeUT8(CharSequenceWrapper.java:183)
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 49 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message