hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan A. P. Pendleton" ...@geekdom.net>
Subject Re: SequenceFile (Text,Text) becomes plain text
Date Fri, 02 Feb 2007 22:46:03 GMT
For that to work, the output of the previous job will have to set to
SequenceFileOuputFormat.

Note that, unless there are no tab characters in the keys of the output from
the first job, there's no way to read the existing output accurately back
in.

On 2/2/07, Dennis Kubes <nutch-dev@dragonflymc.com> wrote:
>
> You need to set the input format of the second job.  It defaults to
> TextInputFormat which is why you are seeing it become text.  Use a line
> like below in the second job.
>
> secondjob.setInputFormat(SequenceFileInputFormat.class);
> secondjob.setInputKeyClass(Text.class);
> secondjob.setInputValueClass(Text.class);
>
> Dennis Kubes
>
> Alejandro Abdelnur wrote:
> > I may be missing something silly here,
> >
> > I have a MR that generates an output type (Text,Text)
> >
> > Consuming that output for another MR it becomes a plain text file thus
> the
> > input is (LongWriteable, Text) with the long key being the line number
> and
> > the text value is the key+value separated by a tab and my second MR blow
> as
> > it was expecting (Text,Text) plus that the key is wrong.
> >
> > Doing a cat of the file I see it become a flat file with lines having
> "key
> > \t value".
> >
> > How can I force the output of the first MR to remain a sequence file of
> > (Text, Text)?
> >
> > Thxs.
> >
> > A
> >
>



-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message