drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From parthchandra <...@git.apache.org>
Subject [GitHub] drill pull request #600: DRILL-4373: Drill and Hive have incompatible timest...
Date Tue, 18 Oct 2016 17:36:33 GMT
Github user parthchandra commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
    @@ -739,30 +739,54 @@ public void runTestAndValidate(String selection, String validationSelection,
    -  Test the reading of an int96 field. Impala encodes timestamps as int96 fields
    +    Impala encodes timestamp values as int96 fields. Test the reading of an int96 field
with two converters:
    +    the first one converts parquet INT96 into drill VARBINARY and the second one (works
    +    store.parquet.reader.int96_as_timestamp option is enabled) converts parquet INT96
into drill TIMESTAMP.
       public void testImpalaParquetInt96() throws Exception {
         compareParquetReadersColumnar("field_impala_ts", "cp.`parquet/int96_impala_1.parquet`");
    +    try {
    +      test("alter session set %s = true", ExecConstants.PARQUET_READER_INT96_AS_TIMESTAMP);
    +      compareParquetReadersColumnar("field_impala_ts", "cp.`parquet/int96_impala_1.parquet`");
    --- End diff --
    Github seems to have swallowed the previous comments so including @vdiravka's questions
    >  1) Is it better to compare result with baseline columns and values from the file
or it is ok to compare with sqlBaselineQuery and disabled new PARQUET_READER_INT96_AS_TIMESTAMP
    > In the process of investigating this test I found that the primitive data type of
the column in the file int96_dict_change.parquet is BINARY, not INT96.
    > 2) I am a little bit confused with this. Do we need convert this BINARY to TIMESTAMP
as well? CONVERT_FROM function with IMPALA_TIMESTAMP argument works properly for this field.
I will investigate a little more about does impala and hive can store timestamps into parquet
    For 1) I think it is better to compare values from the file as opposed to running with
    For 2) Can you correct the int96 data in the file? AFAIK, the data should be int96 for
the test.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message