Return-Path: Delivered-To: apmail-hadoop-pig-commits-archive@www.apache.org Received: (qmail 44160 invoked from network); 23 Feb 2010 23:21:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Feb 2010 23:21:01 -0000 Received: (qmail 49067 invoked by uid 500); 23 Feb 2010 23:21:01 -0000 Delivered-To: apmail-hadoop-pig-commits-archive@hadoop.apache.org Received: (qmail 49046 invoked by uid 500); 23 Feb 2010 23:21:01 -0000 Mailing-List: contact pig-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-commits@hadoop.apache.org Received: (qmail 49037 invoked by uid 500); 23 Feb 2010 23:21:01 -0000 Delivered-To: apmail-incubator-pig-commits@incubator.apache.org Received: (qmail 49034 invoked by uid 99); 23 Feb 2010 23:21:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 23:21:01 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Feb 2010 23:20:59 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 56ECB16E2F for ; Tue, 23 Feb 2010 23:20:38 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 23 Feb 2010 23:20:38 -0000 Message-ID: <20100223232038.5839.73796@eos.apache.org> Subject: =?utf-8?q?=5BPig_Wiki=5D_Update_of_=22LoadStoreMigrationGuide=22_by_Prade?= =?utf-8?q?epKamath?= X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for chang= e notification. The "LoadStoreMigrationGuide" page has been changed by PradeepKamath. http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=3Ddiff&rev1=3D30&= rev2=3D31 -------------------------------------------------- = =3D=3D Table mapping old API calls to new API calls in rough order of cal= l sequence =3D=3D ||'''Old Method in !LoadFunc''' ||'''Equivalent New Method''' ||'''New Cl= ass/Interface in which method is present''' ||'''Explanation''' || - ||No equivalent method ||setUDFContextSignature() ||!LoadFunc ||This meth= od will be called by Pig both in the front end and back end to pass a uniqu= e signature to the Loader. The signature can be used to store into the !UDF= Context any information which the Loader needs to store between various met= hod invocations in the front end and back end. A use case is to store !Requ= iredFieldList passed to it in !LoadPushDown.pushProjection(!RequiredFieldLi= st) for use in the back end before returning tuples in getNext() || + ||No equivalent method ||setUDFContextSignature() ||!LoadFunc ||This meth= od will be called by Pig both in the front end and back end to pass a uniqu= e signature to the Loader. The signature can be used to store into the !UDF= Context any information which the Loader needs to store between various met= hod invocations in the front end and back end. A use case is to store !Requ= iredFieldList passed to it in !LoadPushDown.pushProjection(!RequiredFieldLi= st) for use in the back end before returning tuples in getNext(). The defau= lt implementation in !LoadFunc has an empty body. || - ||No equivalent method ||relativeToAbsolutePath() ||!LoadFunc ||Pig runti= me will call this method to allow the Loader to convert a relative load loc= ation to an absolute location. The default implementation provided in !Load= Func handles this for hdfs files and directories. If the load source is som= ething else, loader implementation may choose to override this. || + ||No equivalent method ||relativeToAbsolutePath() ||!LoadFunc ||Pig runti= me will call this method to allow the Loader to convert a relative load loc= ation to an absolute location. The default implementation provided in !Load= Func handles this for !FileSystem locations. If the load source is somethin= g else, loader implementation may choose to override this. || ||determineSchema() ||getSchema() ||!LoadMetadata ||determineSchema() was= used by old code to ask the loader to provide a schema for the data return= ed by it - the same semantics are now achieved through getSchema() of the != LoadMetadata interface. !LoadMetadata is an optional interface for loaders = to implement - if a loader does not implement it, this will indicate to the= pig runtime that the loader cannot return a schema for the data || ||fieldsToRead() ||pushProject() ||!LoadPushDown ||fieldsToRead() was use= d by old code to convey to the loader the exact fields required by the pig = script -the same semantics are now achieved through pushProject() of the !L= oadPushDown interface. !LoadPushDown is an optional interface for loaders t= o implement - if a loader does not implement it, this will indicate to the = pig runtime that the loader is not capable of returning just the required f= ields and will return all fields in the data. If a loader implementation is= able to efficiently return only required fields, it should implement !Load= PushDown to improve query performance || ||No equivalent method ||getInputFormat() ||!LoadFunc ||This method will = be called by Pig to get the !InputFormat used by the loader. The methods in= the !InputFormat (and underlying !RecordReader) will be called by pig in t= he same manner (and in the same context) as by Hadoop in a map-reduce java = program. '''If the !InputFormat is a hadoop packaged one, the implementatio= n should use the new API based one under org.apache.hadoop.mapreduce. If it= is a custom !InputFormat, it should be implemented using the new API in or= g.apache.hadoop.mapreduce'''|| @@ -31, +31 @@ An example of how a simple !LoadFunc implementation based on old interfac= e can be converted to the new interfaces is shown in the Examples section b= elow. = =3D StoreFunc Migration =3D - The main change is that the new !StoreFunc API is based on a !OutputForma= t to read the data. Implementations can choose to use existing !OutputForma= t like !TextOutputFormat or implement a new one. + !StoreFunc is now an abstract class providing default implementations for= some of the methods. The main change is that the new !StoreFunc API is bas= ed on a !OutputFormat to read the data. Implementations can choose to use e= xisting !OutputFormat like !TextOutputFormat or implement a new one. = =3D=3D Table mapping old API calls to new API calls in rough order of cal= l sequence =3D=3D ||'''Old Method in !StoreFunc''' ||'''Equivalent New Method''' ||'''New C= lass/Interface in which method is present''' ||'''Explanation''' || - ||No equivalent method ||setStoreFuncUDFContextSignature() ||!StoreFunc |= |This method will be called by Pig both in the front end and back end to pa= ss a unique signature to the Storer. The signature can be used to store int= o the UDFContext any information which the Storer needs to store between va= rious method invocations in the front end and back end. || + ||No equivalent method ||setStoreFuncUDFContextSignature() ||!StoreFunc |= |This method will be called by Pig both in the front end and back end to pa= ss a unique signature to the Storer. The signature can be used to store int= o the UDFContext any information which the Storer needs to store between va= rious method invocations in the front end and back end. The default impleme= ntation in !StoreFunc has an empty body. || - ||No equivalent method ||relToAbsPathForStoreLocation() ||!StoreFunc ||Pi= g runtime will call this method to allow the Storer to convert a relative s= tore location to an absolute location. An implementation is provided in !Lo= adFunc (as a static method) which handles this for hdfs files and directori= es. || + ||No equivalent method ||relToAbsPathForStoreLocation() ||!StoreFunc ||Pi= g runtime will call this method to allow the Storer to convert a relative s= tore location to an absolute location. An implementation is provided in !St= oreFunc which handles this for !FileSystem based locations. || - ||No equivalent method ||checkSchema() ||!StoreFunc ||A Store function sh= ould implement this function to check that a given schema describing the da= ta to be written is acceptable to it || + ||No equivalent method ||checkSchema() ||!StoreFunc ||A Store function sh= ould implement this function to check that a given schema describing the da= ta to be written is acceptable to it. The default implementation in !StoreF= unc has an empty body.|| ||No equivalent method ||setStoreLocation() ||!StoreFunc ||This method is= called by Pig to communicate the store location to the storer. The storer = should use this method to communicate the same information to the underlyin= g !OutputFormat. This method is called multiple times by pig - implementati= ons should bear in mind that this method is called multiple times and shoul= d ensure there are no inconsistent side effects due to the multiple calls. = || ||getStorePreparationClass() ||getOutputFormat() ||!StoreFunc ||In the ol= d API, getStorePreparationClass() was the means by which the implementation= could communicate to Pig the !OutputFormat to use for writing - this is no= w achieved through getOutputFormat(). getOutputFormat() is NOT an optional = method and implementation SHOULD provide an !OutputFormat to use. The metho= ds in the !OutputFormat (and underlying !RecordWriter and !OutputCommitter)= will be called by pig in the same manner (and in the same context) as by H= adoop in a map-reduce java program. '''If the !OutputFormat is a hadoop pac= kaged one, the implementation should use the new API based one in org.apach= e.hadoop.mapreduce. If it is a custom !OutputFormat, it should be implement= ed using the new API under org.apache.hadoop.mapreduce'''. The checkOutputS= pecs() method of the !OutputFormat will be called by pig to check the outpu= t location up-front. This method will also be called as part of the Hadoop = call sequence when the job is launched. So implementations should ensure th= at this method can be called multiple times without inconsistent side effec= ts. || ||bindTo() ||prepareToWrite() ||!StoreFunc ||bindTo() was the old method = which would provide an !OutputStream among other things to the !StoreFunc. = The !StoreFunc implementation would then write to the !OutputStream in putN= ext(). In the new API, writing of the data is through the !OutputFormat pro= vided by the !StoreFunc. So the equivalent call is prepareToWrite() wherein= the !RecordWriter associated with the !OutputFormat provided by the !Store= Func is passed to the !StoreFunc. The !RecordWriter can then be used by the= implementation in putNext() to write a tuple representing a record of data= in a manner expected by the !RecordWriter. || @@ -468, +468 @@ }}} =3D=3D=3D New Implementation =3D=3D=3D {{{ - public class SimpleTextStorer implements StoreFunc { + public class SimpleTextStorer extends StoreFunc { protected RecordWriter writer =3D null; = private byte fieldDel =3D '\t'; @@ -662, +662 @@ } } = - @Override - public void checkSchema(ResourceSchema s) throws IOException { - // nothing to do - } - = - @Override - public String relToAbsPathForStoreLocation(String location, Path curD= ir) - throws IOException { - return LoadFunc.getAbsolutePath(location, curDir); - } - = - @Override - public void setStoreFuncUDFContextSignature(String signature) { - // nothing to do - } - = } }}} =3D=3D Notes: =3D=3D