hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Need Help: The problem with text key of MapFile
Date Wed, 28 Oct 2009 09:11:56 GMT
I do not know why you need use MapFile, could you use SequenceFile instead ?

The MapFile's advantage is its read performance, because it build index on
its keys. So its keys must be in order.

If you really want to use MapFile, you can first write your data to
SequenceFile and then covert it to MapFile.

About  how to convert SequenceFile to MapFile:
1. Sort the SequenceFile using sort in examples of hadoop
2. create index for the output of the above step. then you get both of the
data file and index file


You an refer Tom Whilte's book "Hadoop definitive guide" for details about
how to convert SequenceFile into MapFile

Jeff Zhang



On Wed, Oct 28, 2009 at 4:47 PM, lei wang <hadoopmaillist@gmail.com> wrote:

> but now, "url" is not in order,  must the key be intwritable ? should it
>  be
> comparable ?
> How to make sure them in order?sort it first?
> I just want to insert the pages for  random acess by "url ".
>
> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
>
> > Hi Wang,
> >
> > The keys of MapFile should be in order, so when you add records into
> > MapFile, you should make sure you insert them in order
> >
> > Best Regards,
> >
> > Jeff Zhang
> >
> >
> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <hadoopmaillist@gmail.com>
> > wrote:
> >
> > > Hi, friends
> > > I need store the web pages(a huge one) in the MapFile of the hadoop, So
> i
> > > did use the url as the key, and its type is "text", When  writring the
> > > records into the mapfile, it give an error as "out of order", which
> type
> > > should I choose to represent  the key "url", can anyone give me some
> > detail
> > > answer, thanks for you help.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message