Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-user@lucene.apache.org
Received-SPF: neutral (herse.apache.org: local policy)
Message-ID: <4656941D.3040605@trank.no>
Date: Fri, 25 May 2007 09:45:33 +0200
From: Espen Amble Kolstad <espen@trank.no>
Organization: T-Rank AS
User-Agent: Thunderbird 2.0.0.0 (X11/20070418)
MIME-Version: 1.0
To: hadoop-user@lucene.apache.org
Subject: Re: LzoCodec not working correctly?
References: <465597F2.1040404@trank.no> <20070524185148.GA19288@yahoo-inc.com>
 <46568458.7010607@trank.no>
In-Reply-To: <46568458.7010607@trank.no>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi,

I changed LzoCompressor.finished() from:

  public synchronized boolean finished() {
    // ...
    return (finished && compressedDirectBuf.remaining() == 0);
  }

to:

  public synchronized boolean finished() {
    // ...
    return (finish && compressedDirectBuf.remaining() == 0);
  }

And it seems to work correctly now. I used CompressionCodecFactory.main
to test this. It failed before the change, and works after the change.
Both compress and decompress works.

Could you verify Arun? I'll do some more testing.

thanks,
Espen

Espen Amble Kolstad wrote:
> Hi Arun,
> 
> Arun C Murthy wrote:
>> Espen,
>>
>> On Thu, May 24, 2007 at 03:49:38PM +0200, Espen Amble Kolstad wrote:
>>> Hi,
>>>
>>> I've been trying to use LzoCodec to write a compressed file:
>>>
>> Could you try this command:
>> $ bin/hadoop jar build/hadoop-0.12.4-dev-test.jar testsequencefile -seed 0 -count 10000 -compressType RECORD blah.seq -codec org.apache.hadoop.io.compress.LzoCodec -check
> This works like it should:
> 07/05/25 08:29:07 INFO io.SequenceFile: count = 10000
> 07/05/25 08:29:07 INFO io.SequenceFile: megabytes = 1
> 07/05/25 08:29:07 INFO io.SequenceFile: factor = 10
> 07/05/25 08:29:07 INFO io.SequenceFile: create = true
> 07/05/25 08:29:07 INFO io.SequenceFile: seed = 0
> 07/05/25 08:29:07 INFO io.SequenceFile: rwonly = false
> 07/05/25 08:29:07 INFO io.SequenceFile: check = true
> 07/05/25 08:29:07 INFO io.SequenceFile: fast = false
> 07/05/25 08:29:07 INFO io.SequenceFile: merge = false
> 07/05/25 08:29:07 INFO io.SequenceFile: compressType = RECORD
> 07/05/25 08:29:07 INFO io.SequenceFile: compressionCodec =
> org.apache.hadoop.io.compress.LzoCodec
> 07/05/25 08:29:07 INFO io.SequenceFile: file = blah.seq
> 07/05/25 08:29:07 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 07/05/25 08:29:07 INFO compress.LzoCodec: Successfully loaded &
> initialized native-lzo library
> 07/05/25 08:29:07 INFO io.SequenceFile: creating 10000 records with
> RECORD compression
> 07/05/25 08:29:13 INFO io.SequenceFile: writing intermediate results to
> /tmp/hadoop-espen/mapred/local/intermediate.1
> 07/05/25 08:29:15 INFO io.SequenceFile: done sorting 10000 debug
> 07/05/25 08:29:15 INFO io.SequenceFile: sorting 10000 records in memory
> for debug
> 
> I think the difference, is that I try to write to the stream twice. It
> seems hadoop-code always writes all bytes at once.
> 
> The code in LzoCompressor checks for userBufLen <= 0 and sets finished =
> true, userBufLen is set in setInput(). This results in that you can only
> write to the stream once?!
> 
> - Espen
> 
>> LzoCodec seems to work fine for me... maybe your FileOutputStream was somehow corrupted?
>>
>> thanks,
>> Arun
>>
>>> public class LzoTest {
>>>
>>>   public static void main(String[] args) throws Exception {
>>>      final LzoCodec codec = new LzoCodec();
>>>      codec.setConf(new Configuration());
>>>      final CompressionOutputStream out = codec.createOutputStream(new
>>> FileOutputStream("test.lzo"));
>>>      out.write("abc".getBytes());
>>>      out.write("def".getBytes());
>>>      out.close();
>>>   }
>>> }
>>>
>>> I get the following output:
>>>
>>> 07/05/24 15:44:22 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>> library
>>> 07/05/24 15:44:22 INFO compress.LzoCodec: Successfully loaded &
>>> initialized native-lzo library
>>> Exception in thread "main" java.io.IOException: write beyond end of stream
>>> 	at
>>> org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:68)
>>> 	at java.io.OutputStream.write(OutputStream.java:58)
>>> 	at no.trank.tI.LzoTest.main(LzoTest.java:19)
>>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> 	at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> 	at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>>> 	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
>>>
>>> Isn't it possible to use LzoCodec for this purpose, or is this a bug?
>>>
>>> - Espen
>