hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Carlini (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-6349) Implement FastLZCodec for fastlz/lzo algorithm
Date Wed, 11 Aug 2010 15:32:28 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicholas Carlini updated HADOOP-6349:
-------------------------------------

    Attachment: hadoop-6349-4.patch

Fixed the buffering issues. There is still work to do here, though. In the case where compress()
is called with an array that is big enough, then instead of compressing to a temporary buffer
and then to the byte array given, it should compress directly to that buffer if possible.
It is possible to not do this if the compressor was changed to keep its state and resume compression
from the middle, but that seems like it would be more work than it's worth, at very little
cost. This would, however, mean that the buffer size used by the CompressorStream would need
to be around 64k so that the majority of the time bytes could be written directly to it. (That
applies to decompression, too.)

Fixed a bug where it was possible for the end of stream mark to show up in the wrong place.


Fixed another bug from the compressor where calling write(int) on the output stream n times
would add 47*n bytes to the output stream. This is because each time write() is called, so
is compress(), which means 26 bytes for a header block and 16 bytes of a header, and then
1 byte of uncompressed data. Fixed this by adding another case where if the buffer has fewer
than 2^16 (default block size) bytes, then it'll return true from needsInput, and so it won't
compress.

Changed the way that the test codec performance will call write() several times instead of
writing all at once. It now calls write() with a random length until it's out of input.

Got rid of the uses of BigInteger ...

Removed the moved code ... no idea why it was ever there.

Made abbreviations in comments real words.

FASTLZ_STRICT_ALIGN mode when set to false doesn't even work. Deleted.

FASTLZ_SAFE mode set to false has no performance increases. All it does is make a few if statements
happen (which don't have any loops or anything). Deleted.

At some point, the header blocks (with block ID 1) should get removed. They have no purpose
and just remain from the port from 6pack.c. Maybe even make the header smaller by removing
the ID now that all blocks will have ID of 17. And the 'options' are only one bit for compressed
or just uncompressible data.

> Implement FastLZCodec for fastlz/lzo algorithm
> ----------------------------------------------
>
>                 Key: HADOOP-6349
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6349
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>            Reporter: William Kinney
>         Attachments: hadoop-6349-1.patch, hadoop-6349-2.patch, hadoop-6349-3.patch, hadoop-6349-4.patch,
HADOOP-6349-TestFastLZCodec.patch, HADOOP-6349.patch, TestCodecPerformance.java, TestCodecPerformance.java,
testCodecPerfResults.tsv
>
>
> Per  [HADOOP-4874|http://issues.apache.org/jira/browse/HADOOP-4874], FastLZ is a good
(speed, license) alternative to LZO. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message