hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu Xianglong" <sallonch...@hotmail.com>
Subject Re: Store mapreduce output into my own data structures
Date Sat, 28 Nov 2009 05:45:32 GMT
Hi, Jeff. Thanks for you reply. Actually, I will do further process of the 
map-reduce output. If I cannot store them in memory, other modules cannot 
process them. So if these modules are integrated into map-reduce, then they 
will finish the process in mapreduce jobs. The problem is that these modules 
are complicated. The easy way is to store output of jobs in memory. What do 
you think? Do you have such experiences?

--------------------------------------------------
From: "Jeff Zhang" <zjffdu@gmail.com>
Sent: Friday, November 27, 2009 10:46 PM
To: <hbase-user@hadoop.apache.org>
Subject: Re: Store mapreduce output into my own data structures

> So how do you plan to integrate your other modules with hadoop ?
>
> Put them in reduce phase ?
>
>
> Jeff Zhang
>
>
>
> On Fri, Nov 27, 2009 at 6:37 AM, <sallonchina@hotmail.com> wrote:
>
>> Actually I want the output can be used by other modules. So it has to 
>> read
>> the output from hdfs files? Or integrate these modules into map-reduce? 
>> Is
>> there other ways?
>>
>> --------------------------------------------------
>> From: "Jeff Zhang" <zjffdu@gmail.com>
>> Sent: Friday, November 27, 2009 10:00 PM
>> To: <hbase-user@hadoop.apache.org>
>> Subject: Re: Store mapreduce output into my own data structures
>>
>>
>>  Hi Liu,
>>>
>>> Why you want to store the output in memory?  You can not use the output
>>> out
>>> of reducer.
>>> Actually at the beginning the output of reducer is in memory, and the
>>> OutputFormat write these data to file system or other data store.
>>>
>>>
>>> Jeff Zhang
>>>
>>>
>>>
>>> 2009/11/27 Liu Xianglong <sallonchina@hotmail.com>
>>>
>>>  Hi, everyone. Is there someone who uses map-reduce to store the reduce
>>>> output in memory. I mean, now the output path of job is set and reduce
>>>> outputs are stored into files under this path.(see the comments along
>>>> with
>>>> the following codes)
>>>>    job.setOutputFormatClass(MyOutputFormat.class);
>>>>    //can I implement my OutputFormat to store these output key-value
>>>> pairs
>>>> in my data structures, or are these other ways to do it?
>>>>    job.setOutputKeyClass(ImmutableBytesWritable.class);
>>>>    job.setOutputValueClass(Result.class);
>>>>    FileOutputFormat.setOutputPath(job, outputDir);
>>>>
>>>>  Is there any way to store them in some variables or data structures?
>>>> Then
>>>> how can I implement my OutputFormat? Any suggestions and codes are
>>>> welcomed.
>>>>
>>>> Another question: is there some way to set the number of map task? It
>>>> seems
>>>> there is no API to do this in hadoop new job APIs. I am not sure the 
>>>> way
>>>> to
>>>> set this number.
>>>>
>>>> Thanks!
>>>>
>>>> Best Wishes!
>>>> _____________________________________________________________
>>>>
>>>> 刘祥龙  Liu Xianglong
>>>>
>>>>
>>>
> 

Mime
View raw message