Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25819E5B9 for ; Fri, 22 Feb 2013 04:10:09 +0000 (UTC) Received: (qmail 63828 invoked by uid 500); 22 Feb 2013 04:10:04 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 63049 invoked by uid 500); 22 Feb 2013 04:10:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 62980 invoked by uid 99); 22 Feb 2013 04:10:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 04:10:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of publicnetworkservices@gmail.com designates 209.85.216.171 as permitted sender) Received: from [209.85.216.171] (HELO mail-qc0-f171.google.com) (209.85.216.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 04:09:53 +0000 Received: by mail-qc0-f171.google.com with SMTP id d1so116219qca.2 for ; Thu, 21 Feb 2013 20:09:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=EdVbFXWCr56bydn2Jp+zHhcsKp5EY57mlbsuM8vxqyg=; b=FD8ASMUtxJ4znLi0DE2ZWDeNnENUOSxyvl0zCGwxqP5Hqcpz8qmJRR+Wkc/2idLhbm o2PngEA2ds05E5vpQTslTDJL3J5nWkf5k84v6NEi9nadXgrnQVcGVLAfcTzkkNcOToAL GDeuIO5yXfZ1Nmp0FVnXZJI7cNWe0HD1OrTImCzqYor0opq/VK8rCACUqkneNCuHIUUP Jb7O6x6kidqJgxmKpHwu32rqpZKzsZXWc7TOxhbfcXlIZ1gfbemtDCu7nwLQveBxMUi7 5lCvikQrn0KEmZJt1O2IrcWeDyii1FNJmGU2d9aL3b8vSQgfmftn45JPp95ZfIlQTRg5 kmWw== MIME-Version: 1.0 X-Received: by 10.49.2.7 with SMTP id 7mr205521qeq.45.1361506172766; Thu, 21 Feb 2013 20:09:32 -0800 (PST) Received: by 10.49.76.41 with HTTP; Thu, 21 Feb 2013 20:09:32 -0800 (PST) In-Reply-To: References: Date: Thu, 21 Feb 2013 20:09:32 -0800 Message-ID: Subject: Re: MapReduce processing with extra (possibly non-serializable) configuration From: Public Network Services To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b6785fcc9a9fc04d6485b73 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6785fcc9a9fc04d6485b73 Content-Type: text/plain; charset=ISO-8859-1 I have considered the DistributedCache and will probably be using it, but in order to have a file to cache I need to serialize the configuration object first. :-) On Thu, Feb 21, 2013 at 5:55 PM, feng lu wrote: > Hi > > May be you can see the useage of DistributedCache [0] , It's a facility > provided by the MR framework to cache files (text,archives, jars etc) > needed by applications. > > [0] > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > > > On Fri, Feb 22, 2013 at 5:10 AM, Public Network Services < > publicnetworkservices@gmail.com> wrote: > >> Hi... >> >> I am trying to put an existing file processing application into Hadoop >> and need to find the best way of propagating some extra configuration per >> split, in the form of complex and proprietary custom Java objects. >> >> The general idea is >> >> 1. A custom InputFormat splits the input data >> 2. The same InputFormat prepares the appropriate configuration for >> each split >> 3. Hadoop processes each split in MapReduce, using the split itself >> and the corresponding configuration >> >> The problem is that these configuration objects contain a lot of >> properties and references to other complex objects, and so on, therefore it >> will take a lot of work to cover all the possible combinations and make the >> whole thing serializable (if it can be done in the first place). >> >> Most probably this is the only way forward, but if anyone has ever dealt >> with this problem, please suggest the best approach to follow. >> >> Thanks! >> >> > > > -- > Don't Grow Old, Grow Up... :-) > --047d7b6785fcc9a9fc04d6485b73 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have considered the DistributedCache and will probably be using it, but i= n order to have a file to cache I need to serialize the configuration objec= t first. :-)


On Thu, Feb 21, 2013 at= 5:55 PM, feng lu <amuseme.lu@gmail.com> wrote:
Hi=A0

Ma= y be you can see the useage of DistributedCache [0] , It's a facility p= rovided by the MR framework =A0to cache files (text,archives, jars etc) nee= ded by applications.



On Fri, Feb 22, 2013 at 5:10 AM, Public Network Services <= span dir=3D"ltr"><publicnetworkservices@gmail.com> wrote:
Hi...

I am trying to put = an existing file processing application into Hadoop and need to find the be= st way of propagating some extra configuration per split, in the form of co= mplex and proprietary custom Java objects.

The general idea is
  1. A custom InputFor= mat splits the input data
  2. The same InputFormat prepares the appropr= iate configuration for each split
  3. Hadoop processes each split in Ma= pReduce, using the split itself and the corresponding configuration
The problem is that these configuration objects contain a l= ot of properties and references to other complex objects, and so on, theref= ore it will take a lot of work to cover all the possible combinations and m= ake the whole thing serializable (if it can be done in the first place).

Most probably this is the only way forward, but if anyo= ne has ever dealt with this problem, please suggest the best approach to fo= llow.

Thanks!




--
Don't Grow Old, Grow Up.= .. :-)

--047d7b6785fcc9a9fc04d6485b73--