drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3149) TextReader should support multibyte line delimiters
Date Tue, 14 Jun 2016 20:01:36 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330275#comment-15330275
] 

ASF GitHub Bot commented on DRILL-3149:
---------------------------------------

Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/500#discussion_r67042660
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/TestSelectWithOption.java ---
    @@ -89,30 +88,51 @@ public void testTabFieldDelimiter() throws Exception {
             listOf("2", "b"));
       }
     
    -  @Test @Ignore // It does not look like lineDelimiter is working
    -  public void testTextLineDelimiter() throws Exception {
    +  @Test
    +  public void testSingleTextLineDelimiter() throws Exception {
    +    String tableName = genCSVTable("testSingleTextLineDelimiter",
    +        "a|b|c");
    +
    +    testWithResult(format("select columns from table(%s(type => 'TeXT', lineDelimiter
=> '|'))", tableName),
    +        listOf("a"),
    +        listOf("b"),
    +        listOf("c"));
    +  }
    +
    +  @Test
    +  // '\n' is treated as standard delimiter
    +  // if user has indicated custom line delimiter but input file contains '\n', split
will occur on both
    +  public void testCustomTextLineDelimiterAndNewLine() throws Exception {
         String tableName = genCSVTable("testTextLineDelimiter",
    -        "\"b\"|\"0\"",
    -        "\"b\"|\"1\"",
    -        "\"b\"|\"2\"");
    +        "b|1",
    +        "b|2");
     
         testWithResult(format("select columns from table(%s(type => 'TeXT', lineDelimiter
=> '|'))", tableName),
    -        listOf("\"b\""),
    -        listOf("\"0\"", "\"b\""),
    -        listOf("\"1\"", "\"b\""),
    -        listOf("\"2\"")
    -      );
    +        listOf("b"),
    +        listOf("1"),
    +        listOf("b"),
    +        listOf("2"));
       }
     
       @Test
       public void testTextLineDelimiterWithCarriageReturn() throws Exception {
         String tableName = genCSVTable("testTextLineDelimiterWithCarriageReturn",
    -        "1,a\r",
    -        "2,b\r");
    +        "1, a\r",
    +        "2, b\r");
         String lineDelimiter = new String(new char[]{92, 114, 92, 110}); // represents \r\n
         testWithResult(format("select columns from table(%s(type=>'TeXT', lineDelimiter
=> '%s'))", tableName, lineDelimiter),
    -        listOf("1,a"),
    -        listOf("2,b"));
    +        listOf("1, a"),
    +        listOf("2, b"));
    +  }
    +
    +  @Test
    +  public void testMultiByteLineDelimiter() throws Exception {
    +    String tableName = genCSVTable("testMultiByteLineDelimiter",
    +        "1abc2abc3abc");
    --- End diff --
    
    @parthchandra 
    Sorry for misunderstanding, added third commit with new test - testDataWithPartOfMultiByteLineDelimiter.


> TextReader should support multibyte line delimiters
> ---------------------------------------------------
>
>                 Key: DRILL-3149
>                 URL: https://issues.apache.org/jira/browse/DRILL-3149
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text & CSV
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Jim Scott
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>             Fix For: Future
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message