hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur" <tuc...@gmail.com>
Subject Re: SequenceFile (Text,Text) becomes plain text
Date Sun, 04 Feb 2007 16:31:29 GMT
Yes, found the problem, it was something dumb, not setting the output to
SequenceFileOutputFormat. now things work.

Now that things work I've noticed the output of a MR using
SequenceFileOutputFormat is not compressed, but when I create a
SequenceFile.Writer it is by default compressed.

How to I set the MR output to be compressed in the JobConf? I can set
compression for the Map output but not for the MR output.

Thxs.

Alejandro


On 2/3/07, Bryan A. P. Pendleton <bp@geekdom.net> wrote:
>
> For that to work, the output of the previous job will have to set to
> SequenceFileOuputFormat.
>
> Note that, unless there are no tab characters in the keys of the output
> from
> the first job, there's no way to read the existing output accurately back
> in.
>
> On 2/2/07, Dennis Kubes <nutch-dev@dragonflymc.com> wrote:
> >
> > You need to set the input format of the second job.  It defaults to
> > TextInputFormat which is why you are seeing it become text.  Use a line
> > like below in the second job.
> >
> > secondjob.setInputFormat(SequenceFileInputFormat.class);
> > secondjob.setInputKeyClass(Text.class);
> > secondjob.setInputValueClass(Text.class);
> >
> > Dennis Kubes
> >
> > Alejandro Abdelnur wrote:
> > > I may be missing something silly here,
> > >
> > > I have a MR that generates an output type (Text,Text)
> > >
> > > Consuming that output for another MR it becomes a plain text file thus
> > the
> > > input is (LongWriteable, Text) with the long key being the line number
> > and
> > > the text value is the key+value separated by a tab and my second MR
> blow
> > as
> > > it was expecting (Text,Text) plus that the key is wrong.
> > >
> > > Doing a cat of the file I see it become a flat file with lines having
> > "key
> > > \t value".
> > >
> > > How can I force the output of the first MR to remain a sequence file
> of
> > > (Text, Text)?
> > >
> > > Thxs.
> > >
> > > A
> > >
> >
>
>
>
> --
> Bryan A. P. Pendleton
> Ph: (877) geek-1-bp
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message