lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (LUCENE-1591) Enable bzip compression in benchmark
Date Mon, 13 Apr 2009 10:29:14 GMT


Uwe Schindler updated LUCENE-1591:

    Attachment: commons-compress-dev20090413.jar

Here the latest snapshot build of commons compress. All test passed through "mvn install"
About the initial "BZh" bytes. In the javadocs still stands, that they should be read before
opening the strea, But the examples on the website and the BZip2Decompressor code is:
private void init() throws IOException {
        if (null == in) {
            throw new IOException("No InputStream");
        if (in.available() == 0) {
            throw new IOException("Empty InputStream");
        checkMagicChar('B', "first");
        checkMagicChar('Z', "second");
        checkMagicChar('h', "third");

So I think, the reading of the initial two bytes can be left out. If something is wrong, this
class should throw an IOException.

Here some usage: (this shows, that decompressing a
bzip2 file does not need to skip the header),
here the javadocs:

> Enable bzip compression in benchmark
> ------------------------------------
>                 Key: LUCENE-1591
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Shai Erera
>             Fix For: 2.9
>         Attachments: ant-1.7.1.jar, commons-compress-dev20090413.jar, LUCENE-1591.patch,
> bzip compression can aid the benchmark package by not requiring extracting bzip files
(such as enwiki) in order to index them. The plan is to add a config parameter bzip.compression=true/false
and in the relevant tasks either decompress the input file or compress the output file using
the bzip streams.
> It will add a dependency on ant.jar which contains two classes similar to GZIPOutputStream
and GZIPInputStream which compress/decompress files using the bzip algorithm.
> bzip is known to be superior in its compression performance to the gzip algorithm (~20%
better compression), although it does the compression/decompression a bit slower.
> I wil post a patch which adds this parameter and implement it in LineDocMaker, EnwikiDocMaker
and WriteLineDoc task. Maybe even add the capability to DocMaker or some of the super classes,
so it can be inherited by all sub-classes.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message