pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "LoadStoreMigrationGuide" by PradeepKamath
Date Tue, 23 Feb 2010 22:25:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/LoadStoreMigrationGuide?action=diff&rev1=29&rev2=30

--------------------------------------------------

  ||No equivalent method ||setLocation() ||!LoadFunc ||This method is called by Pig to communicate
the load location to the loader. The loader should use this method to communicate the same
information to the underlying !InputFormat. This method is called multiple times by pig -
implementations should bear in mind that this method is called multiple times and should ensure
there are no inconsistent side effects due to the multiple calls. ||
  ||bindTo() ||prepareToRead() ||!LoadFunc ||bindTo() was the old method which would provide
an !InputStream among other things to the !LoadFunc. The !LoadFunc implementation would then
read from the !InputStream in getNext(). In the new API, reading of the data is through the
!InputFormat provided by the !LoadFunc. So the equivalent call is prepareToRead() wherein
the !RecordReader associated with the !InputFormat provided by the !LoadFunc is passed to
the !LoadFunc. The !RecordReader can then be used by the implementation in getNext() to return
a tuple representing a record of data back to pig. ||
  ||getNext() ||getNext() ||!LoadFunc ||The meaning of getNext() has not changed and is called
by Pig runtime to get the next tuple in the data - in the new API, this is the method wherein
the implementation will use the the underlying !RecordReader and construct a tuple ||
- ||bytesToInteger(),...bytesToBag() ||bytesToInteger(),...bytesToBag() ||!LoadCaster || The
signature of bytesToTuple, bytesToBag is also changed to take a field schema of the bag/tuple,
and bytesToTuple/bytesToBag should construct the tuple/bag in conformance with the given field
schema. The meaning of these methods has not changed and is called by Pig runtime to cast
a !DataByteArray fields to the right type when needed. In the new API, a !LoadFunc implementation
should give a !LoadCaster object back to pig as the return value of getLoadCaster() method
so that it can be used for casting. The default implementation in !LoadFunc returns an instance
of UTF8StorageConvertor which can handle casting from UTF-8 bytes to different types. If a
null is returned then casting from !DataByteArray to any other type (implicitly or explicitly)
in the pig script will not be possible.  ||
+ ||bytesToInteger(),...bytesToBag() ||bytesToInteger(),...bytesToBag() ||!LoadCaster || The
signature of bytesToTuple,.. bytesToBag methods has changed to take a field schema of the
bag/tuple, and bytesToTuple/bytesToBag should construct the tuple/bag in conformance with
the given field schema. The meaning of these methods has not changed and is called by Pig
runtime to cast a !DataByteArray fields to the right type when needed. In the new API, a !LoadFunc
implementation should give a !LoadCaster object back to pig as the return value of getLoadCaster()
method so that it can be used for casting. The default implementation in !LoadFunc returns
an instance of UTF8StorageConvertor which can handle casting from UTF-8 bytes to different
types. If a null is returned then casting from !DataByteArray to any other type (implicitly
or explicitly) in the pig script will not be possible.  ||
  
  
  An example of how a simple !LoadFunc implementation based on old interface can be converted
to the new interfaces is shown in the Examples section below.

Mime
View raw message