hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei wang <hadoopmaill...@gmail.com>
Subject Re: Need Help: The problem with text key of MapFile
Date Thu, 29 Oct 2009 00:53:16 GMT
hi,juff, thanks for your comments.
   I did read this book early, I use MapFile to store my web pages for
random access.
First I think the SquenceFile conversion as a solution, howerve, the
problem is that I need append the new pages to the MapFile by minute
or second, so I didn't think SquenceFile conversion can manage this.
Would you give me some suggestion? Think your very much!

Best wishes.

On 10/28/09, Jeff Zhang <zjffdu@gmail.com> wrote:
> I do not know why you need use MapFile, could you use SequenceFile instead ?
>
> The MapFile's advantage is its read performance, because it build index on
> its keys. So its keys must be in order.
>
> If you really want to use MapFile, you can first write your data to
> SequenceFile and then covert it to MapFile.
>
> About  how to convert SequenceFile to MapFile:
> 1. Sort the SequenceFile using sort in examples of hadoop
> 2. create index for the output of the above step. then you get both of the
> data file and index file
>
>
> You an refer Tom Whilte's book "Hadoop definitive guide" for details about
> how to convert SequenceFile into MapFile
>
> Jeff Zhang
>
>
>
> On Wed, Oct 28, 2009 at 4:47 PM, lei wang <hadoopmaillist@gmail.com> wrote:
>
>> but now, "url" is not in order,  must the key be intwritable ? should it
>>  be
>> comparable ?
>> How to make sure them in order?sort it first?
>> I just want to insert the pages for  random acess by "url ".
>>
>> On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
>>
>> > Hi Wang,
>> >
>> > The keys of MapFile should be in order, so when you add records into
>> > MapFile, you should make sure you insert them in order
>> >
>> > Best Regards,
>> >
>> > Jeff Zhang
>> >
>> >
>> > On Wed, Oct 28, 2009 at 4:14 PM, lei wang <hadoopmaillist@gmail.com>
>> > wrote:
>> >
>> > > Hi, friends
>> > > I need store the web pages(a huge one) in the MapFile of the hadoop,
>> > > So
>> i
>> > > did use the url as the key, and its type is "text", When  writring the
>> > > records into the mapfile, it give an error as "out of order", which
>> type
>> > > should I choose to represent  the key "url", can anyone give me some
>> > detail
>> > > answer, thanks for you help.
>> > >
>> >
>>
>

Mime
View raw message