pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "LoadStoreMigrationGuide" by PradeepKamath
Date Thu, 18 Feb 2010 21:23:25 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "LoadStoreMigrationGuide" page has been changed by PradeepKamath.


  ||getStorePreparationClass() ||getOutputFormat() ||!StoreFunc ||In the old API, getStorePreparationClass()
was the means by which the implementation could communicate to Pig the !OutputFormat to use
for writing - this is now achieved through getOutputFormat(). getOutputFormat() is NOT an
optional method and implementation SHOULD provide an !OutputFormat to use. The methods in
the !OutputFormat (and underlying !RecordWriter and !OutputCommitter) will be called by pig
in the same manner (and in the same context) as by Hadoop in a map-reduce java program. The
checkOutputSpecs() method of the !OutputFormat will be called by pig to check the output location
up-front. This method will also be called as part of the Hadoop call sequence when the job
is launched. So implementations should ensure that this method can be called multiple times
without inconsistent side effects. ||
  ||bindTo() ||prepareToWrite() ||!StoreFunc ||bindTo() was the old method which would provide
an !OutputStream among other things to the !StoreFunc. The !StoreFunc implementation would
then write to the !OutputStream in putNext(). In the new API, writing of the data is through
the !OutputFormat provided by the !StoreFunc. So the equivalent call is prepareToWrite() wherein
the !RecordWriter associated with the !OutputFormat provided by the !StoreFunc is passed to
the !StoreFunc. The !RecordWriter can then be used by the implementation in putNext() to write
a tuple representing a record of data in a manner expected by the !RecordWriter. ||
  ||putNext() ||putNext() ||!StoreFunc ||The meaning of putNext() has not changed and is called
by Pig runtime to write the next tuple of data - in the new API, this is the method wherein
the implementation will use the the underlying !RecordWriter to write the Tuple out ||
- ||finish() ||no equivalent method in !StoreFunc - implementations can use close() in !RecordWriter
or commitTask in !OutputCommitter ||!RecordWriter or !OutputCommitter ||finish() has been
removed from !StoreFunc since the same semantics can be achieved by !RecordWriter.close()
or !OutputCommitter.commitTask() - in the latter case !OutputCommitter.needsTaskCommit() should
return true. ||
+ ||finish() ||no equivalent method in !StoreFunc - implementations can use commitTask() in
!OutputCommitter ||!OutputCommitter ||finish() has been removed from !StoreFunc since the
same semantics can be achieved by !OutputCommitter.commitTask() - (!OutputCommitter.needsTaskCommit()
should return true to be able to use commitTask()). ||
  An example of how a simple !StoreFunc implementation based on old interface can be converted
to the new interfaces is shown in the Examples section below.

View raw message