hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <>
Subject [jira] [Updated] (HIVE-8720) Update orc_merge tests to make it consistent across OS'es
Date Tue, 04 Nov 2014 03:45:33 GMT


Prasanth J updated HIVE-8720:
    Attachment: orc_merge5_filedump_opensuse.txt

Attaching orc filedump for orc_merge5.q file test case ran in Mac OS X and OpenSUSE. As we
can see from the row index statistics of stripe 1 and 2 the order of rows were different (stripe
1 in Mac OS X ended up as stripe 2 in OpenSuse).

> Update orc_merge tests to make it consistent across OS'es
> ---------------------------------------------------------
>                 Key: HIVE-8720
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>         Attachments: orc_merge5_filedump_macosx.txt, orc_merge5_filedump_opensuse.txt
> orc_merge*.q test cases fails with qfile diffs related to file size on different OSes.
I have seen failures with Open SUSE and CentOS. The order of insertion of rows into ORC table
impacts the file size because of run length encoding. Since the order of rows is not guaranteed
during insertion into table we may get different file sizes. We cannot add ORDER BY to insert
queries as it will force insertion through single reducer which will disable orc merge file
optimization. Since these test cases test if the files are merged or not it is sufficient
to know the number of files after merging. Instead of DESCRIBE FORMATTED (which shows the
numFiles and fileSize) we can use "dfs -ls" to know the number of files.

This message was sent by Atlassian JIRA

View raw message