hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lori Ann Martin <lmar...@altair.com>
Subject RE: Need Help: The problem with text key of MapFile
Date Thu, 29 Oct 2009 02:27:27 GMT
heck out www.HiQube.com or www.pbsgridworks.com

-----Original Message-----
From: lei wang [mailto:hadoopmaillist@gmail.com] 
Sent: Wednesday, October 28, 2009 7:22 PM
To: general@hadoop.apache.org
Subject: Re: Need Help: The problem with text key of MapFile

Oh, I  have tried hbase in the early.
But I think HDFS may give me a choice.
Thanks.

On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <zjffdu@gmail.com> wrote:

> I guess maybe HBase will be fit for you.   HBase is a distributed database
> built upon Hadoop.
> You can use the url as the row key and put other fields into columns.
>
> then you can retrieve the web page through HBase Client API and insert new
> web page into it. The performance of HBase 0.20 is good enough for you.
>
> Best Regards,
> Jeff zhang
>
>
> On Thu, Oct 29, 2009 at 8:53 AM, lei wang <hadoopmaillist@gmail.com>
> wrote:
>
> > hi,juff, thanks for your comments.
> >   I did read this book early, I use MapFile to store my web pages for
> > random access.
> > First I think the SquenceFile conversion as a solution, howerve, the
> > problem is that I need append the new pages to the MapFile by minute
> > or second, so I didn't think SquenceFile conversion can manage this.
> > Would you give me some suggestion? Think your very much!
> >
> > Best wishes.
> >
> > On 10/28/09, Jeff Zhang <zjffdu@gmail.com> wrote:
> > > I do not know why you need use MapFile, could you use SequenceFile
> > instead ?
> > >
> > > The MapFile's advantage is its read performance, because it build index
> > on
> > > its keys. So its keys must be in order.
> > >
> > > If you really want to use MapFile, you can first write your data to
> > > SequenceFile and then covert it to MapFile.
> > >
> > > About  how to convert SequenceFile to MapFile:
> > > 1. Sort the SequenceFile using sort in examples of hadoop
> > > 2. create index for the output of the above step. then you get both of
> > the
> > > data file and index file
> > >
> > >
> > > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> > about
> > > how to convert SequenceFile into MapFile
> > >
> > > Jeff Zhang
> > >
> > >
> > >
> > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <hadoopmaillist@gmail.com>
> > wrote:
> > >
> > >> but now, "url" is not in order,  must the key be intwritable ? should
> it
> > >>  be
> > >> comparable ?
> > >> How to make sure them in order?sort it first?
> > >> I just want to insert the pages for  random acess by "url ".
> > >>
> > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
> > >>
> > >> > Hi Wang,
> > >> >
> > >> > The keys of MapFile should be in order, so when you add records into
> > >> > MapFile, you should make sure you insert them in order
> > >> >
> > >> > Best Regards,
> > >> >
> > >> > Jeff Zhang
> > >> >
> > >> >
> > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <hadoopmaillist@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > Hi, friends
> > >> > > I need store the web pages(a huge one) in the MapFile of the
> hadoop,
> > >> > > So
> > >> i
> > >> > > did use the url as the key, and its type is "text", When  writring
> > the
> > >> > > records into the mapfile, it give an error as "out of order",
> which
> > >> type
> > >> > > should I choose to represent  the key "url", can anyone give
me
> some
> > >> > detail
> > >> > > answer, thanks for you help.
> > >> > >
> > >> >
> > >>
> > >
> >
>

Mime
View raw message