Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 35700 invoked from network); 26 Oct 2010 14:04:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Oct 2010 14:04:46 -0000 Received: (qmail 78874 invoked by uid 500); 26 Oct 2010 14:04:43 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 78563 invoked by uid 500); 26 Oct 2010 14:04:39 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 78555 invoked by uid 99); 26 Oct 2010 14:04:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 14:04:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of hadoopnode@gmail.com designates 209.85.215.48 as permitted sender) Received: from [209.85.215.48] (HELO mail-ew0-f48.google.com) (209.85.215.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 14:04:33 +0000 Received: by ewy3 with SMTP id 3so2269521ewy.35 for ; Tue, 26 Oct 2010 07:04:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=yXITGno7nkwyu4e0LZUTfMwcoM7kqCaF2IK+6jf771A=; b=ZnuJxOVyIEcZFTCFG/JLCpaVZIzs3SwiZpbcFJNW2DYwMEEc7T3GMxUOU0XL3iYc1n Aa86IH4b9fr7ZHxWh6rw+VObou62LraOksxHel1pqhF6OPkGBins8eeYeCCOq54FoWA6 NP8i9snPydLS5UJEha9mLG2SS1Lt+g7dzXjd8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=JWNd1QIdQsDDksXc2x1lbQDIcL64RZ1pQTFkKOr5SwNUyAHoNOnHJUce/N23S7LaFN zsZM1VXaeTBKrR+OXYzsIKx1aOFC+M0Q/Jxiz7MNRM3VjCf23O+LYJEKLBLgb14fCUkz eId8BCaUVlEeIOsmnM/UfGuVcjLwdtu0FREBE= MIME-Version: 1.0 Received: by 10.213.11.1 with SMTP id r1mr1608378ebr.62.1288101851990; Tue, 26 Oct 2010 07:04:11 -0700 (PDT) Received: by 10.213.29.81 with HTTP; Tue, 26 Oct 2010 07:04:11 -0700 (PDT) In-Reply-To: References: Date: Tue, 26 Oct 2010 10:04:11 -0400 Message-ID: Subject: Re: LZO Compression Libraries don't appear to work properly with MultipleOutputs From: ed To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174c3f28528a7f04938595cc --0015174c3f28528a7f04938595cc Content-Type: text/plain; charset=ISO-8859-1 Calling close() on the MultipleOutputs objects in the cleanup() method of the reducer fixed the lzo file problem. Thanks! ~Ed On Thu, Oct 21, 2010 at 9:12 PM, ed wrote: > Hi Todd, > > I don't have the code in front of me right but I was looking over the API > docs and it looks like I forgot to call close() on the MultipleOutput. I'll > post back if that fixes the problem. If not I'll put together a unit test. > Thanks! > > ~Ed > > > On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon wrote: > >> Hi Ed, >> >> Sounds like this might be a bug, either in MultipleOutputs or in LZO. >> >> Does it work properly with gzip compression? Which LZO implementation >> are you using? The one from google code or the more up to date one >> from github (either kevinweil's or mine)? >> >> Any chance you could write a unit test that shows the issue? >> >> Thanks >> -Todd >> >> On Thu, Oct 21, 2010 at 2:52 PM, ed wrote: >> > Hello everyone, >> > >> > I am having problems using MultipleOutputs with LZO compression (could >> be a >> > bug or something wrong in my own code). >> > >> > In my driver I set >> > >> > MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class, >> > NullWritable.class, Text.class); >> > >> > In my reducer I have: >> > >> > MultipleOutputs mOutput = new >> > MultipleOutputs(context); >> > >> > public String generateFileName(Key key){ >> > return "custom_file_name"; >> > } >> > >> > Then in the reduce() method I have: >> > >> > mOutput.write(mNullWritable, mValue, generateFileName(key)); >> > >> > This results in creating LZO files that do not decompress properly (lzop >> -d >> > throws the error "lzop: unexpected end of file: outputFile.lzo") >> > >> > If I switch back to the regular context.write(mNullWritable, mValue); >> > everything works fine. >> > >> > Am I forgetting a step needed when using MultipleOutputs or is this a >> > bug/non-feature of using LZO compression in Hadoop. >> > >> > Thank you! >> > >> > >> > ~Ed >> > >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > --0015174c3f28528a7f04938595cc--