Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of zjffdu@gmail.com designates
 209.85.160.50 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=dAmZpXK+WCuhP3DAdhjjdkSH+YfuPVJ23BV7ZxrjwQeUcB/ddenb9XJWonuRTexqYr
         RQ/QtyF/0Durbtxz6OfFU8q3ffOZFZ5Oam0doSBKOAzdk8Pw6hXc5gTwkFWlF2g8eJS/
         etkFOo5Clroc84ga5PXSq5WJZ0m1XmRKryowM=
MIME-Version: 1.0
In-Reply-To: <8211a1320911272238x33133a5x1963e7ec12d4707d@mail.gmail.com>
References: <BAY121-DS354E59EF955B826A9F850B19A0@phx.gbl>
	 <8211a1320911270600k1db30762h1e0f9912ba5814d6@mail.gmail.com>
	 <BAY121-DS4EB52D5484D349BB33A87B19A0@phx.gbl>
	 <8211a1320911270646i775fa686i96daa32177743513@mail.gmail.com>
	 <BAY121-DS5FD82C0BBBD4A64AD95ACB1990@phx.gbl>
	 <8211a1320911272238x33133a5x1963e7ec12d4707d@mail.gmail.com>
Date: Fri, 27 Nov 2009 22:48:41 -0800
Message-ID: <8211a1320911272248s7a024142q776002796ffc6399@mail.gmail.com>
Subject: Re: Store mapreduce output into my own data structures
From: Jeff Zhang <zjffdu@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001636e0a98b7e749f047968ccf8

--001636e0a98b7e749f047968ccf8
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Or you can put the further processing in another Map-Reduce Job, make the
whole a map-reduce jobs chain.


Jeff Zhang


On Fri, Nov 27, 2009 at 10:38 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

>
> Hi Liu,
>
> The reducer task is in an individual JVM, you have to put your modules in=
to
> reducer task if you really want to access the output in memory.
>
> I am not sure the size of your output, if it's not large, I suggest put
> them in a message, and wrap your modules into a listener, and then send t=
he
> message to this listener for further processing.
>
> If the size of your output is large, I suggest you store them in hdfs, an=
d
> put the location in a message and send the message to the listener.
> Because you said your modules are complicated, so I suggest you separate
> them with the map-reduce jobs as I mentioned above , it will increase the
> maintainability and extensibility of your system.
>
>
> Jeff Zhang
>
>
>
>
> On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong <sallonchina@hotmail.com>w=
rote:
>
>> Hi, Jeff. Thanks for you reply. Actually, I will do further process of t=
he
>> map-reduce output. If I cannot store them in memory, other modules canno=
t
>> process them. So if these modules are integrated into map-reduce, then t=
hey
>> will finish the process in mapreduce jobs. The problem is that these mod=
ules
>> are complicated. The easy way is to store output of jobs in memory. What=
 do
>> you think? Do you have such experiences?
>>
>>
>> --------------------------------------------------
>> From: "Jeff Zhang" <zjffdu@gmail.com>
>> Sent: Friday, November 27, 2009 10:46 PM
>>
>> To: <hbase-user@hadoop.apache.org>
>> Subject: Re: Store mapreduce output into my own data structures
>>
>>  So how do you plan to integrate your other modules with hadoop ?
>>>
>>> Put them in reduce phase ?
>>>
>>>
>>> Jeff Zhang
>>>
>>>
>>>
>>> On Fri, Nov 27, 2009 at 6:37 AM, <sallonchina@hotmail.com> wrote:
>>>
>>>  Actually I want the output can be used by other modules. So it has to
>>>> read
>>>> the output from hdfs files? Or integrate these modules into map-reduce=
?
>>>> Is
>>>> there other ways?
>>>>
>>>> --------------------------------------------------
>>>> From: "Jeff Zhang" <zjffdu@gmail.com>
>>>> Sent: Friday, November 27, 2009 10:00 PM
>>>> To: <hbase-user@hadoop.apache.org>
>>>> Subject: Re: Store mapreduce output into my own data structures
>>>>
>>>>
>>>>  Hi Liu,
>>>>
>>>>>
>>>>> Why you want to store the output in memory?  You can not use the outp=
ut
>>>>> out
>>>>> of reducer.
>>>>> Actually at the beginning the output of reducer is in memory, and the
>>>>> OutputFormat write these data to file system or other data store.
>>>>>
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>>
>>>>> 2009/11/27 Liu Xianglong <sallonchina@hotmail.com>
>>>>>
>>>>>  Hi, everyone. Is there someone who uses map-reduce to store the redu=
ce
>>>>>
>>>>>> output in memory. I mean, now the output path of job is set and redu=
ce
>>>>>> outputs are stored into files under this path.(see the comments alon=
g
>>>>>> with
>>>>>> the following codes)
>>>>>>   job.setOutputFormatClass(MyOutputFormat.class);
>>>>>>   //can I implement my OutputFormat to store these output key-value
>>>>>> pairs
>>>>>> in my data structures, or are these other ways to do it?
>>>>>>   job.setOutputKeyClass(ImmutableBytesWritable.class);
>>>>>>   job.setOutputValueClass(Result.class);
>>>>>>   FileOutputFormat.setOutputPath(job, outputDir);
>>>>>>
>>>>>>  Is there any way to store them in some variables or data structures=
?
>>>>>> Then
>>>>>> how can I implement my OutputFormat? Any suggestions and codes are
>>>>>> welcomed.
>>>>>>
>>>>>> Another question: is there some way to set the number of map task? I=
t
>>>>>> seems
>>>>>> there is no API to do this in hadoop new job APIs. I am not sure the
>>>>>> way
>>>>>> to
>>>>>> set this number.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Best Wishes!
>>>>>> _____________________________________________________________
>>>>>>
>>>>>> =E5=88=98=E7=A5=A5=E9=BE=99  Liu Xianglong
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

--001636e0a98b7e749f047968ccf8--