hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eye (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-5590) select and get duplicated records with hive when a .defalte file greater than 64MB was loaded to a hive table
Date Fri, 18 Oct 2013 10:12:42 GMT
eye created HIVE-5590:
-------------------------

             Summary: select and get duplicated records with hive when a .defalte file greater
than 64MB was loaded to a hive table
                 Key: HIVE-5590
                 URL: https://issues.apache.org/jira/browse/HIVE-5590
             Project: Hive
          Issue Type: Bug
         Environment: cdh4
            Reporter: eye


we occasionally have some compressed file larger than 160MB in .deflate format. And it was
load to hive using an external table, say table T_A.
when select count(*) from T_A we got more records,70% more! compared with that we use  "hadoop
fs -text /xxxxx |wc -l"  to check the file.
any clue for this?

the large .deflate file was due to imperfect processing , when we fixed it and get files less
than 64M. the above problem did not come up. But since it is not guaranteed that a larger
file would not show up again. is there any way to avoid this subject ?

cheers!
eye




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message