pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "LoadStoreMigrationGuide" by PradeepKamath
Date Thu, 11 Feb 2010 22:18:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=14&rev2=15

--------------------------------------------------

  || No equivalent method || getInputFormat() ||!LoadFunc ||  This method will be called by
Pig to get the !InputFormat used by the loader. The methods in the !InputFormat (and underlying
!RecordReader) will be called by pig in the same manner (and in the same context) as by Hadoop
in a map-reduce java program.||
  || No equivalent method || setLocation() || !LoadFunc || This method is called by Pig to
communicate the load location to the loader. The loader should use this method to communicate
the same information to the underlying !InputFormat. This method is called multiple times
by pig - implementations should bear in mind that this method is called multiple times and
should ensure there are no inconsistent side effects due to the multiple calls.||
  || bindTo() || prepareToRead() || !LoadFunc || bindTo() was the old method which would provide
an !InputStream among other things to the !LoadFunc. The !LoadFunc implementation would then
read from the !InputStream in getNext(). In the new API, reading of the data is through the
!InputFormat provided by the !LoadFunc. So the equivalent call is prepareToRead() wherein
the !RecordReader associated with the !InputFormat provided by the !LoadFunc is passed to
the !LoadFunc. The !RecordReader can then be used by the implementation in getNext() to return
a tuple representing a record of data back to pig. ||
- || getNext() || getNext() || !LoadFunc || The meaning of getNext() has not changed and is
called by Pig runtime to get the next tuple in the data ||
+ || getNext() || getNext() || !LoadFunc || The meaning of getNext() has not changed and is
called by Pig runtime to get the next tuple in the data - in the new API, this is the method
wherein the implementation will use the the underlying !RecordReader and construct a tuple||
  || bytesToInteger(),...bytesToBag() ||  bytesToInteger(),...bytesToBag() || !LoadCaster
|| The meaning of these methods has not changed and is called by Pig runtime to cast a !DataByteArray
fields to the right type when needed. In the new API, a !LoadFunc implementation should give
a !LoadCaster object back to pig as the return value of getLoadCaster() method so that it
can be used for casting. If a null is returned then casting from !DataByteArray to any other
type (implicitly or explicitly) in the pig script will not be possible ||
  
  An example of how a simple !LoadFunc implementation based on old interface can be converted
to the new interfaces is shown in the Examples section below. 
@@ -27, +27 @@

  = StoreFunc Migration =
  
  The main change is that the new !StoreFunc API is based on a !OutputFormat to read the data.
Implementations can choose to use existing !OutputFormat like !TextOutputFormat or implement
a new one.
+ 
+ == Table mapping old API calls to new API calls in rough order of call sequence ==
+ || '''Old Method in !StoreFunc''' || '''Equivalent New Method''' || '''New Class/Interface
in which method is present''' || '''Explanation''' ||
+ || No equivalent method || setStoreFuncUDFContextSignature() || !StoreFunc || This method
will be called by Pig both in the front end and back end to pass a unique signature to the
Storer. The signature can be used to store into the UDFContext} any information which the
Storer needs to store between various method invocations in the front end and back end.||
+ || No equivalent method || relToAbsPathForStoreLocation() || !StoreFunc || Pig runtime will
call this method to allow the Storer to convert a relative load location to an absolute location.
An implementation is provided in !LoadFunc (as a static method) which handles this for hdfs
files and directories. 
+ || No equivalent method || checkSchema() || !StoreFunc || A Store function should implement
this function to check that a given schema is acceptable to it ||
+ || No equivalent method || setStoreLocation() || !StoreFunc || This method is called by
Pig to communicate the store location to the storer. The storer should use this method to
communicate the same information to the underlying !OutputFormat. This method is called multiple
times by pig - implementations should bear in mind that this method is called multiple times
and should ensure there are no inconsistent side effects due to the multiple calls.||
+ || getStorePreparationClass() || getOutputFormat() || In the old API, getStorePreparationClass()
was the means by which the implementation could communicate to Pig the !OutputFormat to use
for writing - this is now achieved through getOutputFormat(). getOutputFormat() is NOT an
optional method and implementation SHOULD provide an !OutputFormat to use. The methods in
the !OutputFormat (and underlying !RecordWriter and !OutputCommitter) will be called by pig
in the same manner (and in the same context) as by Hadoop in a map-reduce java program.||
+ || bindTo() || prepareToWrite() || !StoreFunc || bindTo() was the old method which would
provide an !OutputStream among other things to the !StoreFunc. The !StoreFunc implementation
would then write to the !OutputStream in putNext(). In the new API, writing of the data is
through the !OutputFormat provided by the !StoreFunc. So the equivalent call is prepareToWrite()
wherein the !RecordWriter associated with the !OutputFormat provided by the !StoreFunc is
passed to the !StoreFunc. The !RecordWriter can then be used by the implementation in putNext()
to write a tuple representing a record of data in a manner expected by the !RecordWriter.
||
+ || putNext() || putNext() || !StoreFunc || The meaning of putNext() has not changed and
is called by Pig runtime to write the next tuple of data - in the new API, this is the method
wherein the implementation will use the the underlying !RecordWriter to write the Tuple out
||
+ || finish() || no equivalent method in !StoreFunc - implementations can use close() in !RecordWriter
or commitTask in !OutputCommitter || !RecordWriter or !OutputCommitter || finish() has been
removed from !StoreFunc since the same semantics can be achieved by !RecordWriter.close()
or !OutputCommitter.commitTask() - in the latter case !OutputCommitter.needsTaskCommit() should
return true.||
+ 
+ 
  
  An example of how a simple !StoreFunc implementation based on old interface can be converted
to the new interfaces is shown in the Examples section below. 
  
@@ -458, +471 @@

          
      private byte fieldDel = '\t';
      private static final int BUFFER_SIZE = 1024;
-     
+     private static final String UTF8 = "UTF-8";
      public PigStorage() {
      }
      
@@ -499, +512 @@

                  throw ee;
              }
  
-             putField(mOut, field);
+             putField(field);
  
              if (i != sz - 1) {
                  mOut.write(fieldDel);

Mime
View raw message