hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei wang <hadoopmaill...@gmail.com>
Subject Re: Need Help: The problem with text key of MapFile
Date Thu, 29 Oct 2009 04:49:11 GMT
I cann't understand why you give me two web sites.

On Thu, Oct 29, 2009 at 10:27 AM, Lori Ann Martin <lmartin@altair.com>wrote:

> heck out www.HiQube.com or www.pbsgridworks.com
>
> -----Original Message-----
> From: lei wang [mailto:hadoopmaillist@gmail.com]
> Sent: Wednesday, October 28, 2009 7:22 PM
> To: general@hadoop.apache.org
> Subject: Re: Need Help: The problem with text key of MapFile
>
> Oh, I  have tried hbase in the early.
> But I think HDFS may give me a choice.
> Thanks.
>
> On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang <zjffdu@gmail.com> wrote:
>
> > I guess maybe HBase will be fit for you.   HBase is a distributed
> database
> > built upon Hadoop.
> > You can use the url as the row key and put other fields into columns.
> >
> > then you can retrieve the web page through HBase Client API and insert
> new
> > web page into it. The performance of HBase 0.20 is good enough for you.
> >
> > Best Regards,
> > Jeff zhang
> >
> >
> > On Thu, Oct 29, 2009 at 8:53 AM, lei wang <hadoopmaillist@gmail.com>
> > wrote:
> >
> > > hi,juff, thanks for your comments.
> > >   I did read this book early, I use MapFile to store my web pages for
> > > random access.
> > > First I think the SquenceFile conversion as a solution, howerve, the
> > > problem is that I need append the new pages to the MapFile by minute
> > > or second, so I didn't think SquenceFile conversion can manage this.
> > > Would you give me some suggestion? Think your very much!
> > >
> > > Best wishes.
> > >
> > > On 10/28/09, Jeff Zhang <zjffdu@gmail.com> wrote:
> > > > I do not know why you need use MapFile, could you use SequenceFile
> > > instead ?
> > > >
> > > > The MapFile's advantage is its read performance, because it build
> index
> > > on
> > > > its keys. So its keys must be in order.
> > > >
> > > > If you really want to use MapFile, you can first write your data to
> > > > SequenceFile and then covert it to MapFile.
> > > >
> > > > About  how to convert SequenceFile to MapFile:
> > > > 1. Sort the SequenceFile using sort in examples of hadoop
> > > > 2. create index for the output of the above step. then you get both
> of
> > > the
> > > > data file and index file
> > > >
> > > >
> > > > You an refer Tom Whilte's book "Hadoop definitive guide" for details
> > > about
> > > > how to convert SequenceFile into MapFile
> > > >
> > > > Jeff Zhang
> > > >
> > > >
> > > >
> > > > On Wed, Oct 28, 2009 at 4:47 PM, lei wang <hadoopmaillist@gmail.com>
> > > wrote:
> > > >
> > > >> but now, "url" is not in order,  must the key be intwritable ?
> should
> > it
> > > >>  be
> > > >> comparable ?
> > > >> How to make sure them in order?sort it first?
> > > >> I just want to insert the pages for  random acess by "url ".
> > > >>
> > > >> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zjffdu@gmail.com>
> wrote:
> > > >>
> > > >> > Hi Wang,
> > > >> >
> > > >> > The keys of MapFile should be in order, so when you add records
> into
> > > >> > MapFile, you should make sure you insert them in order
> > > >> >
> > > >> > Best Regards,
> > > >> >
> > > >> > Jeff Zhang
> > > >> >
> > > >> >
> > > >> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <
> hadoopmaillist@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi, friends
> > > >> > > I need store the web pages(a huge one) in the MapFile of
the
> > hadoop,
> > > >> > > So
> > > >> i
> > > >> > > did use the url as the key, and its type is "text", When
>  writring
> > > the
> > > >> > > records into the mapfile, it give an error as "out of order",
> > which
> > > >> type
> > > >> > > should I choose to represent  the key "url", can anyone
give me
> > some
> > > >> > detail
> > > >> > > answer, thanks for you help.
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message