Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 78584 invoked from network); 2 Apr 2010 22:36:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Apr 2010 22:36:07 -0000 Received: (qmail 86369 invoked by uid 500); 2 Apr 2010 22:36:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 86341 invoked by uid 500); 2 Apr 2010 22:36:07 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 86333 invoked by uid 99); 2 Apr 2010 22:36:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 22:36:07 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kris.nuttycombe@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gy0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Apr 2010 22:35:59 +0000 Received: by gyf1 with SMTP id 1so1173556gyf.35 for ; Fri, 02 Apr 2010 15:35:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=fNVeTHStDUeYzhmYW2eEhCEBQYgwttMYJPACgSxR4G8=; b=wak8JXi3o/g20Knus5Qba41NgIXD2Y7AydZix6KAu9o6olFytU017ZlFCcof778luh xLiUz8Ek+1My0EqKdylyPuX2k8mvwNRoR7jR7AvCU/X9HvnX1vKTKox+wcxem8IeUKom mxAeTwYuSSpKluL5AzmqZLevMeXioUf+kv+fo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=swlQp9Wa4NyzY+DiQTK96/6SECcZZZ+4YLOGb0z/Ri/EK1GvA8EoRIZW88EQxDKdKU BAmVUtmngLznpD5M2jUll/U07CJSZp/07JpCk8Duej9JQSyIOXQidsLeH+cNAakTLRnT Z+HP3a2hetBuptsGKFoy+UVy03LU7GPVoMAXI= MIME-Version: 1.0 Received: by 10.150.148.9 with HTTP; Fri, 2 Apr 2010 15:35:38 -0700 (PDT) In-Reply-To: References: Date: Fri, 2 Apr 2010 16:35:38 -0600 Received: by 10.150.47.19 with SMTP id u19mr3475602ybu.160.1270247738555; Fri, 02 Apr 2010 15:35:38 -0700 (PDT) Message-ID: Subject: Re: Reflective instantiation of Mappers and Reducers From: Kris Nuttycombe To: mapreduce-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Or heck... I could just base-64 encode the serialized byte arrays and pass them as strings in the configuration. If it's going to be a hack, might as well go all the way. On Fri, Apr 2, 2010 at 4:10 PM, Kris Nuttycombe wrote: > On Fri, Apr 2, 2010 at 3:10 PM, Owen O'Malley wrote: >> >> On Apr 2, 2010, at 12:05 PM, Kris Nuttycombe wrote: >> >>> What I'm wondering is, is there any way to simply serialize a Mapper >>> or Reducer object, and have the serialized instance copied, passed >>> around and used everywhere instead of always having the Mapper and >>> Reducer instantiated by reflection? This would greatly simplify >>> library design in my case. >> >> Currently the best you can do is to make your Mapper or Reduce implement >> Configurable and use the values out of the configuration. >> >> Take a look at MAPREDUCE-1183. It should be exactly what you are asking = for >> when it gets implemented. >> >> -- Owen >> > > Thanks for the reference to that ticket, Owen. In the meantime, I > think I may have figured out a workaround. The following code > (completely untested as of yet, but a starting point) provides base > classes for an implementation based upon the distributed cache: > > > import org.apache.hadoop.conf._ > import org.apache.hadoop.util._ > import org.apache.hadoop.mapreduce._ > import java.io._ > import SerializingResourceToolRunner._ > > object SerializingResourceToolRunner { > =A0val serializedResourceName =3D "socialmedia.mr_tool.serfile" > } > > class SerializingResourceToolRunner[T <: Serializable](tool: > SerializingResourceTool[T]) { > =A0def runWithToolRunner(argv: Array[String]) =3D { > =A0 =A0def stripFileArg(i: Int, l: List[String], f: Option[String]): > (List[String], Option[String]) =3D { > =A0 =A0 =A0if (i >=3D argv.length) (l, f) > =A0 =A0 =A0else if (argv(i) =3D=3D "-files") stripFileArg(i + 2, l, optio= n(argv(i + 1))) > =A0 =A0 =A0else stripFileArg(i + 1, argv(i) :: l, f) > =A0 =A0} > > =A0 =A0val tempFile =3D File.createTempFile("mr_tool", ".ser") > =A0 =A0using(new FileOutputStream(tempFile)) { > =A0 =A0 =A0f =3D> using(new ObjectOutputStream(f)) { > =A0 =A0 =A0 =A0out =3D> out.writeObject(tool.resource) > =A0 =A0 =A0} > =A0 =A0} > > =A0 =A0val (args, filesArg) =3D stripFileArg(0, Nil, None) > > =A0 =A0tool.getConf.set(serializedResourceName, tempFile.getName) > =A0 =A0val filesArgWithTempFile =3D filesArg.map(_ + "," + > tempFile.getAbsolutePath).getOrElse(tempFile.getAbsolutePath) > =A0 =A0ToolRunner.run(tool, ("-files" :: filesArgWithTempFile :: args).to= Array) > =A0} > } > > trait Resources[T] { > =A0private var _resource: T =3D _ > =A0def resource: T =3D _resource > > =A0def init(conf: Configuration): Unit =3D { > =A0 =A0_resource =3D using(new FileInputStream(new > File(conf.get(serializedResourceName)))) { > =A0 =A0 =A0f =3D> using(new ObjectInputStream(f)) { > =A0 =A0 =A0 =A0in =3D> in.readObject.asInstanceOf[T] > =A0 =A0 =A0} > =A0 =A0} > =A0} > } > > abstract class SerializedResourceMapper[T, KI, VI, KO, VO] extends > Mapper[KI, VI, KO, VO] with Resources[T] { > =A0override def setup(context: Mapper[KI, VI, KO, VO]#Context): Unit =3D = { > =A0 =A0super.setup(context) > =A0 =A0init(context.getConfiguration) > =A0} > } > > abstract class SerializedResourceReducer[T, KI, VI, KO, VO] extends > Reducer[KI, VI, KO, VO] with Resources[T] { > =A0override def setup(context: Reducer[KI, VI, KO, VO]#Context): Unit =3D= { > =A0 =A0super.setup(context) > =A0 =A0init(context.getConfiguration) > =A0} > } > > abstract class SerializingResourceTool[T <: Serializable] extends > Configured with Tool { > =A0def resource: T > } >