drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Scott" <...@13ways.com>
Subject Re: Review Request 25949: changes to support configuring text formats (field separator, record delimiter, quote character, header line, strip quotes)
Date Mon, 03 Nov 2014 01:14:43 GMT


> On Nov. 2, 2014, 5:20 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java,
line 138
> > <https://reviews.apache.org/r/25949/diff/1/?file=703157#file703157line138>
> >
> >     we use dashes to describe a multiword.  such as field-separator

Please show me an example of this. Dots or no separator at all were all I found. I also didn't
find anything in any supporting documentation or guidelines, if they exist I am happy to follow.
Here are examples of json objects I referenced. You will see there are no separators for storageformat,
dot separators for config settings and camelCase for configProps. 

{
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "workspaces" : {
    "root" : {
      "location" : "/",
      "writable" : false,
      "storageformat" : null
    }
}
hbase.sys.drill
{
  "type" : "hbase",
  "config" : {
    "hbase.zookeeper.quorum" : "localhost",
    "hbase.zookeeper.property.clientPort" : "2181"
  },
  "enabled" : false
}

hive.sys.drill
{
  "type" : "hive",
  "enabled" : false,
  "configProps" : {
    "hive.metastore.uris" : "",
    "javax.jdo.option.ConnectionURL" : "jdbc:derby:;databaseName=../../sample-data/drill_hive_db;create=true",
    "hive.metastore.warehouse.dir" : "/tmp/drill_hive_wh",
    "fs.default.name" : "file:///",
    "hive.metastore.sasl.enabled" : "false"
  }
}


> On Nov. 2, 2014, 5:20 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java,
line 241
> > <https://reviews.apache.org/r/25949/diff/1/?file=703157#file703157line241>
> >
> >     this seems wrong.  Shouldn't these be equivalent sets?

This equals method tests this json object:

    "real" : {
      "type" : "text",
      "extensions" : [ "csv" ],
      "delimiter" : ","
    }
    "fake" : {
      "type" : "text",
      "extensions" : [ "csv" ],
      "delimiter" : ","
    }

If everything is the same they are equal. If I have both of these definitions they will equal
one another regardless of the name "real" / "fake".


> On Nov. 2, 2014, 5:20 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/text/DrillTextRecordReader.java,
line 76
> > <https://reviews.apache.org/r/25949/diff/1/?file=703158#file703158line76>
> >
> >     why wouldn't you just use a byte in the previous position.  You should be consistent.

Not sure I follow. Do you mean change the constructor to require bytes? If so I was just following
the pattern put in place.


> On Nov. 2, 2014, 5:20 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/store/text/DrillTextRecordReader.java,
line 116
> > <https://reviews.apache.org/r/25949/diff/1/?file=703158#file703158line116>
> >
> >     Why are you jumping back to string here?  Why drop to char/byte and then come
back?

In this code base we utilize the byte. In the hadoop core library it uses a string. I have
a comment explaining that right above that line of code.


- Jim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25949/#review59545
-----------------------------------------------------------


On Sept. 23, 2014, 2:53 p.m., Jim Scott wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25949/
> -----------------------------------------------------------
> 
> (Updated Sept. 23, 2014, 2:53 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> Updates for handling text formatted files to address: https://issues.apache.org/jira/browse/DRILL-1440
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/pom.xml 81dbeff 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/JsonRecordWriter.java
76c4ace 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text/TextFormatPlugin.java
b64a032 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/text/DrillTextRecordReader.java
7b8761c 
>   exec/java-exec/src/main/java/org/apache/drill/exec/store/text/DrillTextRecordWriter.java
31b1fbe 
>   exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/TextFormatConfigTest.java
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/25949/diff/
> 
> 
> Testing
> -------
> 
> There is a test class added to the serialization json work of the data formats including
maintaing backward compatibility as well as altering the field separator name.
> 
> Maven tests were run, with these two errors:
>   TestWriter.simpleCsv » org.apache.drill.exec.rpc.RpcException: Failure while running
fragment. null [479d0e71-2bad-4f4c-b3fc-8bffc98f788b]
>   TestWriter>BaseTestQuery.closeClient:125 » java.lang.IllegalStateException: Failure
while trying to close allocator: Child level allocators not closed.
> 
> I could not figure out the cause for these failures, nor could I properly debug why they
were occurring. The code in the patch should have no impact on those tests (although, I could
be wrong).
> 
> 
> Thanks,
> 
> Jim Scott
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message