hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishna Kumar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2065) RCFile issues
Date Fri, 01 Apr 2011 01:54:06 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014351#comment-13014351
] 

Krishna Kumar commented on HIVE-2065:
-------------------------------------

As I indicated, the reason I made changes for sequence file compatibility is that the original
design was more or less compatible, but the compatibility is now currently broken. If we decide
that the compatibility is not a requirement, I am fine with that - a few documentation changes
will be all that is necessary to indicate that situation. The current layout itself, with
the SEQ prefix, incorrect key length and keyclass/valueclass header etc, will not make much
sense, but we can designate all that 'legacy' ;)


I'd like to move the discussions re potential approaches for better compression to another
thread - any existing bug#? or should I open a new one?.


> RCFile issues
> -------------
>
>                 Key: HIVE-2065
>                 URL: https://issues.apache.org/jira/browse/HIVE-2065
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>            Priority: Minor
>         Attachments: HIVE.2065.patch.0.txt, Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per yongqiang
he, the class is not meant to be thread-safe (and it is not). Might as well get rid of the
confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression happens after
we have written the record length.
> {code}
>       int keyLength = key.getSize();
>       if (keyLength < 0) {
>         throw new IOException("negative length keys not allowed: " + key);
>       }
>       out.writeInt(keyLength + valueLength); // total record length
>       out.writeInt(keyLength); // key portion length
>       if (!isCompressed()) {
>         out.writeInt(keyLength);
>         key.write(out); // key
>       } else {
>         keyCompressionBuffer.reset();
>         keyDeflateFilter.resetState();
>         key.write(keyDeflateOut);
>         keyDeflateOut.flush();
>         keyDeflateFilter.finish();
>         int compressedKeyLen = keyCompressionBuffer.getLength();
>         out.writeInt(compressedKeyLen);
>         out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
>       }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the next field
to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message