Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 74304E3DB for ; Fri, 22 Feb 2013 06:15:55 +0000 (UTC) Received: (qmail 79343 invoked by uid 500); 22 Feb 2013 06:15:50 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 79169 invoked by uid 500); 22 Feb 2013 06:15:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 79150 invoked by uid 99); 22 Feb 2013 06:15:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 06:15:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates 209.85.223.173 as permitted sender) Received: from [209.85.223.173] (HELO mail-ie0-f173.google.com) (209.85.223.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Feb 2013 06:15:42 +0000 Received: by mail-ie0-f173.google.com with SMTP id 9so336235iec.18 for ; Thu, 21 Feb 2013 22:15:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=Rjwz+jJR/xEmti70Yc8hfCGYINAip/y0nnOobWunjXg=; b=AYsT3lYjCKFQaohqIGUvXKa13Im/suqbyz2TrTDJ6YeofjyZ3gLPKdUGNm4WYUwSlm k/y0dhOn/ZuB8G9Bu8AMyxoKGfSTji0ca/TvwZwjrd3mwTEXV1r64oViotFhdTDI9+Jv rk/LUxFw5kGDlGg5l5mUyPszBS75xAILiWSm1UKvv68HDYapt2CBnZV+8DCK91wvLqJI HuLiSlXUuRoWom0Ae4Mi9pDwOGVmrhCUaN+SJfEWO3QQlgGOQ2hlVjHVtTc/LWl4fxSj /rCdNHe5UTCSzyEKr69Iovat6ygTBwVztNQCvzjnZZEJ/UM26bsy9fcLK/+Thl9xksit l2IQ== X-Received: by 10.42.148.71 with SMTP id q7mr291517icv.53.1361513721359; Thu, 21 Feb 2013 22:15:21 -0800 (PST) MIME-Version: 1.0 Received: by 10.50.104.229 with HTTP; Thu, 21 Feb 2013 22:15:01 -0800 (PST) In-Reply-To: References: From: Harsh J Date: Fri, 22 Feb 2013 11:45:01 +0530 Message-ID: Subject: Re: MapReduce processing with extra (possibly non-serializable) configuration To: "" Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnyAAFLB21BtvW/uzLrliQwFZx9+75yn4iEKiTLE3CzXE0PjV4I21vz7P3gz7ozenFSOMqX X-Virus-Checked: Checked by ClamAV on apache.org How do you imagine sending "data" of any kind (be it in object form, etc.) over the network to other nodes, without implementing or relying on a serialization for it? Are you looking for "easy" Java ways such as the distributed cache from Hazelcast, etc., where this may be taken care for you automatically in some way? :) On Fri, Feb 22, 2013 at 2:40 AM, Public Network Services wrote: > Hi... > > I am trying to put an existing file processing application into Hadoop and > need to find the best way of propagating some extra configuration per split, > in the form of complex and proprietary custom Java objects. > > The general idea is > > A custom InputFormat splits the input data > The same InputFormat prepares the appropriate configuration for each split > Hadoop processes each split in MapReduce, using the split itself and the > corresponding configuration > > The problem is that these configuration objects contain a lot of properties > and references to other complex objects, and so on, therefore it will take a > lot of work to cover all the possible combinations and make the whole thing > serializable (if it can be done in the first place). > > Most probably this is the only way forward, but if anyone has ever dealt > with this problem, please suggest the best approach to follow. > > Thanks! > -- Harsh J