Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 50476 invoked from network); 28 Nov 2009 06:49:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Nov 2009 06:49:08 -0000 Received: (qmail 54398 invoked by uid 500); 28 Nov 2009 06:49:07 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 54342 invoked by uid 500); 28 Nov 2009 06:49:06 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 54332 invoked by uid 99); 28 Nov 2009 06:49:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Nov 2009 06:49:06 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of zjffdu@gmail.com designates 209.85.160.50 as permitted sender) Received: from [209.85.160.50] (HELO mail-pw0-f50.google.com) (209.85.160.50) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 28 Nov 2009 06:49:01 +0000 Received: by pwi6 with SMTP id 6so1377702pwi.29 for ; Fri, 27 Nov 2009 22:48:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=e/x1jbyyYzF8Tp7RmCPgiFS+9YP1QatFhCoJYia9/kc=; b=RmNuhuID0c6ktQPNMFYBXe3IslqDFO6+PJQ9uPCs85+IWG8AaV3lwk1PAoNT5+u8cA P0D0GJk96GGaLJ1P5+cQo8MqOqyTRtlgORlfp0emaO3kDr0CzNYLAUnmGfBngFKudo2w dm5bympaulgMaKb9RzMM5TfW3ejq+maJwNQMk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=dAmZpXK+WCuhP3DAdhjjdkSH+YfuPVJ23BV7ZxrjwQeUcB/ddenb9XJWonuRTexqYr RQ/QtyF/0Durbtxz6OfFU8q3ffOZFZ5Oam0doSBKOAzdk8Pw6hXc5gTwkFWlF2g8eJS/ etkFOo5Clroc84ga5PXSq5WJZ0m1XmRKryowM= MIME-Version: 1.0 Received: by 10.142.56.18 with SMTP id e18mr194599wfa.40.1259390921241; Fri, 27 Nov 2009 22:48:41 -0800 (PST) In-Reply-To: <8211a1320911272238x33133a5x1963e7ec12d4707d@mail.gmail.com> References: <8211a1320911270600k1db30762h1e0f9912ba5814d6@mail.gmail.com> <8211a1320911270646i775fa686i96daa32177743513@mail.gmail.com> <8211a1320911272238x33133a5x1963e7ec12d4707d@mail.gmail.com> Date: Fri, 27 Nov 2009 22:48:41 -0800 Message-ID: <8211a1320911272248s7a024142q776002796ffc6399@mail.gmail.com> Subject: Re: Store mapreduce output into my own data structures From: Jeff Zhang To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636e0a98b7e749f047968ccf8 --001636e0a98b7e749f047968ccf8 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Or you can put the further processing in another Map-Reduce Job, make the whole a map-reduce jobs chain. Jeff Zhang On Fri, Nov 27, 2009 at 10:38 PM, Jeff Zhang wrote: > > Hi Liu, > > The reducer task is in an individual JVM, you have to put your modules in= to > reducer task if you really want to access the output in memory. > > I am not sure the size of your output, if it's not large, I suggest put > them in a message, and wrap your modules into a listener, and then send t= he > message to this listener for further processing. > > If the size of your output is large, I suggest you store them in hdfs, an= d > put the location in a message and send the message to the listener. > Because you said your modules are complicated, so I suggest you separate > them with the map-reduce jobs as I mentioned above , it will increase the > maintainability and extensibility of your system. > > > Jeff Zhang > > > > > On Fri, Nov 27, 2009 at 9:45 PM, Liu Xianglong w= rote: > >> Hi, Jeff. Thanks for you reply. Actually, I will do further process of t= he >> map-reduce output. If I cannot store them in memory, other modules canno= t >> process them. So if these modules are integrated into map-reduce, then t= hey >> will finish the process in mapreduce jobs. The problem is that these mod= ules >> are complicated. The easy way is to store output of jobs in memory. What= do >> you think? Do you have such experiences? >> >> >> -------------------------------------------------- >> From: "Jeff Zhang" >> Sent: Friday, November 27, 2009 10:46 PM >> >> To: >> Subject: Re: Store mapreduce output into my own data structures >> >> So how do you plan to integrate your other modules with hadoop ? >>> >>> Put them in reduce phase ? >>> >>> >>> Jeff Zhang >>> >>> >>> >>> On Fri, Nov 27, 2009 at 6:37 AM, wrote: >>> >>> Actually I want the output can be used by other modules. So it has to >>>> read >>>> the output from hdfs files? Or integrate these modules into map-reduce= ? >>>> Is >>>> there other ways? >>>> >>>> -------------------------------------------------- >>>> From: "Jeff Zhang" >>>> Sent: Friday, November 27, 2009 10:00 PM >>>> To: >>>> Subject: Re: Store mapreduce output into my own data structures >>>> >>>> >>>> Hi Liu, >>>> >>>>> >>>>> Why you want to store the output in memory? You can not use the outp= ut >>>>> out >>>>> of reducer. >>>>> Actually at the beginning the output of reducer is in memory, and the >>>>> OutputFormat write these data to file system or other data store. >>>>> >>>>> >>>>> Jeff Zhang >>>>> >>>>> >>>>> >>>>> 2009/11/27 Liu Xianglong >>>>> >>>>> Hi, everyone. Is there someone who uses map-reduce to store the redu= ce >>>>> >>>>>> output in memory. I mean, now the output path of job is set and redu= ce >>>>>> outputs are stored into files under this path.(see the comments alon= g >>>>>> with >>>>>> the following codes) >>>>>> job.setOutputFormatClass(MyOutputFormat.class); >>>>>> //can I implement my OutputFormat to store these output key-value >>>>>> pairs >>>>>> in my data structures, or are these other ways to do it? >>>>>> job.setOutputKeyClass(ImmutableBytesWritable.class); >>>>>> job.setOutputValueClass(Result.class); >>>>>> FileOutputFormat.setOutputPath(job, outputDir); >>>>>> >>>>>> Is there any way to store them in some variables or data structures= ? >>>>>> Then >>>>>> how can I implement my OutputFormat? Any suggestions and codes are >>>>>> welcomed. >>>>>> >>>>>> Another question: is there some way to set the number of map task? I= t >>>>>> seems >>>>>> there is no API to do this in hadoop new job APIs. I am not sure the >>>>>> way >>>>>> to >>>>>> set this number. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Best Wishes! >>>>>> _____________________________________________________________ >>>>>> >>>>>> =E5=88=98=E7=A5=A5=E9=BE=99 Liu Xianglong >>>>>> >>>>>> >>>>>> >>>>> >>> > --001636e0a98b7e749f047968ccf8--