Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 14788 invoked from network); 25 May 2007 07:46:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 May 2007 07:46:04 -0000 Received: (qmail 50771 invoked by uid 500); 25 May 2007 07:46:08 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 50747 invoked by uid 500); 25 May 2007 07:46:08 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 50738 invoked by uid 99); 25 May 2007 07:46:08 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2007 00:46:08 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [213.188.134.18] (HELO mailengine01.web2000.activeisp.com) (213.188.134.18) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 May 2007 00:46:02 -0700 Received: from [10.0.0.4] (unverified [89.10.30.226]) by webmail.activeisp.com (Rockliffe SMTPRA 6.1.22) with ESMTP id for ; Fri, 25 May 2007 09:47:00 +0200 Message-ID: <4656941D.3040605@trank.no> Date: Fri, 25 May 2007 09:45:33 +0200 From: Espen Amble Kolstad Organization: T-Rank AS User-Agent: Thunderbird 2.0.0.0 (X11/20070418) MIME-Version: 1.0 To: hadoop-user@lucene.apache.org Subject: Re: LzoCodec not working correctly? References: <465597F2.1040404@trank.no> <20070524185148.GA19288@yahoo-inc.com> <46568458.7010607@trank.no> In-Reply-To: <46568458.7010607@trank.no> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, I changed LzoCompressor.finished() from: public synchronized boolean finished() { // ... return (finished && compressedDirectBuf.remaining() == 0); } to: public synchronized boolean finished() { // ... return (finish && compressedDirectBuf.remaining() == 0); } And it seems to work correctly now. I used CompressionCodecFactory.main to test this. It failed before the change, and works after the change. Both compress and decompress works. Could you verify Arun? I'll do some more testing. thanks, Espen Espen Amble Kolstad wrote: > Hi Arun, > > Arun C Murthy wrote: >> Espen, >> >> On Thu, May 24, 2007 at 03:49:38PM +0200, Espen Amble Kolstad wrote: >>> Hi, >>> >>> I've been trying to use LzoCodec to write a compressed file: >>> >> Could you try this command: >> $ bin/hadoop jar build/hadoop-0.12.4-dev-test.jar testsequencefile -seed 0 -count 10000 -compressType RECORD blah.seq -codec org.apache.hadoop.io.compress.LzoCodec -check > This works like it should: > 07/05/25 08:29:07 INFO io.SequenceFile: count = 10000 > 07/05/25 08:29:07 INFO io.SequenceFile: megabytes = 1 > 07/05/25 08:29:07 INFO io.SequenceFile: factor = 10 > 07/05/25 08:29:07 INFO io.SequenceFile: create = true > 07/05/25 08:29:07 INFO io.SequenceFile: seed = 0 > 07/05/25 08:29:07 INFO io.SequenceFile: rwonly = false > 07/05/25 08:29:07 INFO io.SequenceFile: check = true > 07/05/25 08:29:07 INFO io.SequenceFile: fast = false > 07/05/25 08:29:07 INFO io.SequenceFile: merge = false > 07/05/25 08:29:07 INFO io.SequenceFile: compressType = RECORD > 07/05/25 08:29:07 INFO io.SequenceFile: compressionCodec = > org.apache.hadoop.io.compress.LzoCodec > 07/05/25 08:29:07 INFO io.SequenceFile: file = blah.seq > 07/05/25 08:29:07 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 07/05/25 08:29:07 INFO compress.LzoCodec: Successfully loaded & > initialized native-lzo library > 07/05/25 08:29:07 INFO io.SequenceFile: creating 10000 records with > RECORD compression > 07/05/25 08:29:13 INFO io.SequenceFile: writing intermediate results to > /tmp/hadoop-espen/mapred/local/intermediate.1 > 07/05/25 08:29:15 INFO io.SequenceFile: done sorting 10000 debug > 07/05/25 08:29:15 INFO io.SequenceFile: sorting 10000 records in memory > for debug > > I think the difference, is that I try to write to the stream twice. It > seems hadoop-code always writes all bytes at once. > > The code in LzoCompressor checks for userBufLen <= 0 and sets finished = > true, userBufLen is set in setInput(). This results in that you can only > write to the stream once?! > > - Espen > >> LzoCodec seems to work fine for me... maybe your FileOutputStream was somehow corrupted? >> >> thanks, >> Arun >> >>> public class LzoTest { >>> >>> public static void main(String[] args) throws Exception { >>> final LzoCodec codec = new LzoCodec(); >>> codec.setConf(new Configuration()); >>> final CompressionOutputStream out = codec.createOutputStream(new >>> FileOutputStream("test.lzo")); >>> out.write("abc".getBytes()); >>> out.write("def".getBytes()); >>> out.close(); >>> } >>> } >>> >>> I get the following output: >>> >>> 07/05/24 15:44:22 INFO util.NativeCodeLoader: Loaded the native-hadoop >>> library >>> 07/05/24 15:44:22 INFO compress.LzoCodec: Successfully loaded & >>> initialized native-lzo library >>> Exception in thread "main" java.io.IOException: write beyond end of stream >>> at >>> org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:68) >>> at java.io.OutputStream.write(OutputStream.java:58) >>> at no.trank.tI.LzoTest.main(LzoTest.java:19) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:90) >>> >>> Isn't it possible to use LzoCodec for this purpose, or is this a bug? >>> >>> - Espen >