drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
Date Tue, 14 Jun 2016 18:27:27 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330088#comment-15330088
] 

ASF GitHub Bot commented on DRILL-3149:
---------------------------------------

Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/500#discussion_r67028137
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java ---
    @@ -89,30 +88,51 @@ public void testTabFieldDelimiter() throws Exception {
             listOf("2", "b"));
       }
     
    -  @Test @Ignore // It does not look like lineDelimiter is working
    -  public void testTextLineDelimiter() throws Exception {
    +  @Test
    +  public void testSingleTextLineDelimiter() throws Exception {
    +    String tableName = genCSVTable("testSingleTextLineDelimiter",
    +        "a|b|c");
    +
    +    testWithResult(format("select columns from table(%s(type => 'TeXT', lineDelimiter
=> '|'))", tableName),
    +        listOf("a"),
    +        listOf("b"),
    +        listOf("c"));
    +  }
    +
    +  @Test
    +  // '\n' is treated as standard delimiter
    +  // if user has indicated custom line delimiter but input file contains '\n', split
will occur on both
    +  public void testCustomTextLineDelimiterAndNewLine() throws Exception {
         String tableName = genCSVTable("testTextLineDelimiter",
    -        "\"b\"|\"0\"",
    -        "\"b\"|\"1\"",
    -        "\"b\"|\"2\"");
    +        "b|1",
    +        "b|2");
     
         testWithResult(format("select columns from table(%s(type => 'TeXT', lineDelimiter
=> '|'))", tableName),
    -        listOf("\"b\""),
    -        listOf("\"0\"", "\"b\""),
    -        listOf("\"1\"", "\"b\""),
    -        listOf("\"2\"")
    -      );
    +        listOf("b"),
    +        listOf("1"),
    +        listOf("b"),
    +        listOf("2"));
       }
     
       @Test
       public void testTextLineDelimiterWithCarriageReturn() throws Exception {
         String tableName = genCSVTable("testTextLineDelimiterWithCarriageReturn",
    -        "1,a\r",
    -        "2,b\r");
    +        "1, a\r",
    +        "2, b\r");
         String lineDelimiter = new String(new char[]{92, 114, 92, 110}); // represents \r\n
         testWithResult(format("select columns from table(%s(type=>'TeXT', lineDelimiter
=> '%s'))", tableName, lineDelimiter),
    -        listOf("1,a"),
    -        listOf("2,b"));
    +        listOf("1, a"),
    +        listOf("2, b"));
    +  }
    +
    +  @Test
    +  public void testMultiByteLineDelimiter() throws Exception {
    +    String tableName = genCSVTable("testMultiByteLineDelimiter",
    +        "1abc2abc3abc");
    --- End diff --
    
    I was expecting a test where the data has part of a multibyte delimiter in it.
    Data -  ab1abc2abc3abc
    delimiter - abc
    Should output - ab1, 2, 3



> TextReader should support multibyte line delimiters
> ---------------------------------------------------
>
>                 Key: DRILL-3149
>                 URL: https://issues.apache.org/jira/browse/DRILL-3149
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Jim Scott
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>             Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message