Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of hadoopnode@gmail.com
 designates 209.85.215.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=JWNd1QIdQsDDksXc2x1lbQDIcL64RZ1pQTFkKOr5SwNUyAHoNOnHJUce/N23S7LaFN
         zsZM1VXaeTBKrR+OXYzsIKx1aOFC+M0Q/Jxiz7MNRM3VjCf23O+LYJEKLBLgb14fCUkz
         eId8BCaUVlEeIOsmnM/UfGuVcjLwdtu0FREBE=
MIME-Version: 1.0
In-Reply-To: <AANLkTik3=xWCOv9M8uV+M5izur2CTjO_04xWHqCnag1U@mail.gmail.com>
References: <AANLkTikB1-ax3u-tRbwxv8Ni8fHsYqtY9xUff27Lh-7W@mail.gmail.com>
	<AANLkTinR_85TYmwjsEjzQEkM9yBqSAbOHxrJZhN4Cujn@mail.gmail.com>
	<AANLkTik3=xWCOv9M8uV+M5izur2CTjO_04xWHqCnag1U@mail.gmail.com>
Date: Tue, 26 Oct 2010 10:04:11 -0400
Message-ID: <AANLkTi=sttRmBJXUn7Wr5q88WYxCGcGsZW79dV-LGaJN@mail.gmail.com>
Subject: Re: LZO Compression Libraries don't appear to work properly with
 MultipleOutputs
From: ed <hadoopnode@gmail.com>
To: common-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174c3f28528a7f04938595cc

--0015174c3f28528a7f04938595cc
Content-Type: text/plain; charset=ISO-8859-1

Calling close() on the MultipleOutputs objects in the cleanup() method of
the reducer fixed the lzo file problem.  Thanks!

~Ed

On Thu, Oct 21, 2010 at 9:12 PM, ed <hadoopnode@gmail.com> wrote:

> Hi Todd,
>
> I don't have the code in front of me right but I was looking over the API
> docs and it looks like I forgot to call close() on the MultipleOutput.  I'll
> post back if that fixes the problem.  If not I'll put together a unit test.
> Thanks!
>
> ~Ed
>
>
> On Thu, Oct 21, 2010 at 6:31 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hi Ed,
>>
>> Sounds like this might be a bug, either in MultipleOutputs or in LZO.
>>
>> Does it work properly with gzip compression? Which LZO implementation
>> are you using? The one from google code or the more up to date one
>> from github (either kevinweil's or mine)?
>>
>> Any chance you could write a unit test that shows the issue?
>>
>> Thanks
>> -Todd
>>
>> On Thu, Oct 21, 2010 at 2:52 PM, ed <hadoopnode@gmail.com> wrote:
>> > Hello everyone,
>> >
>> > I am having problems using MultipleOutputs with LZO compression (could
>> be a
>> > bug or something wrong in my own code).
>> >
>> > In my driver I set
>> >
>> >     MultipleOutputs.addNamedOutput(job, "test", TextOutputFormat.class,
>> > NullWritable.class, Text.class);
>> >
>> > In my reducer I have:
>> >
>> >     MultipleOutputs<NullWritable, Text> mOutput = new
>> > MultipleOutputs<NullWritable, Text>(context);
>> >
>> >     public String generateFileName(Key key){
>> >        return "custom_file_name";
>> >     }
>> >
>> > Then in the reduce() method I have:
>> >
>> >     mOutput.write(mNullWritable, mValue, generateFileName(key));
>> >
>> > This results in creating LZO files that do not decompress properly (lzop
>> -d
>> > throws the error "lzop: unexpected end of file: outputFile.lzo")
>> >
>> > If I switch back to the regular context.write(mNullWritable, mValue);
>> > everything works fine.
>> >
>> > Am I forgetting a step needed when using MultipleOutputs or is this a
>> > bug/non-feature of using LZO compression in Hadoop.
>> >
>> > Thank you!
>> >
>> >
>> > ~Ed
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

--0015174c3f28528a7f04938595cc--