hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-4199) ORC writer doesn't handle non-UTF8 encoded Text properly
Date Mon, 18 Mar 2013 21:29:17 GMT

     [ https://issues.apache.org/jira/browse/HIVE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HIVE-4199:
------------------------------

    Attachment: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch

sxyuan requested code review of "HIVE-4199 [jira] ORC writer doesn't handle non-UTF8 encoded
Text properly".

Reviewers: kevinwilfong

StringTreeWriter currently converts fields stored as Text objects into Strings. This can lose
information (see http://en.wikipedia.org/wiki/Replacement_character#Replacement_character),
and is also unnecessary since the dictionary stores Text objects.

Instead, we can check whether Text or String is preferred and simply use the preferred class,
converting only to String for the index stats.

TEST PLAN
  Run unit tests, including new query. The join in the test query originally produces no results
because of the bug.

REVISION DETAIL
  https://reviews.facebook.net/D9501

AFFECTED FILES
  data/files/nonutf8.txt
  ql/src/test/results/clientpositive/orc_nonutf8.q.out
  ql/src/test/queries/clientpositive/orc_nonutf8.q
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/22719/

To: kevinwilfong, sxyuan
Cc: JIRA

                
> ORC writer doesn't handle non-UTF8 encoded Text properly
> --------------------------------------------------------
>
>                 Key: HIVE-4199
>                 URL: https://issues.apache.org/jira/browse/HIVE-4199
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Samuel Yuan
>            Assignee: Samuel Yuan
>            Priority: Minor
>         Attachments: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch
>
>
> StringTreeWriter currently converts fields stored as Text objects into Strings. This
can lose information (see http://en.wikipedia.org/wiki/Replacement_character#Replacement_character),
and is also unnecessary since the dictionary stores Text objects.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message