pig-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Pig Wiki] Update of "Pig070IncompatibleChanges" by PradeepKamath
Date Thu, 18 Feb 2010 20:27:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The "Pig070IncompatibleChanges" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=18&rev2=19

--------------------------------------------------

  
  In the earlier versions of Pig, a user could specify "split by file" on the loader statement
which would make sure that each map got the entire file rather than the files were further
divided into blocks. This feature was primarily design for streaming optimization but could
also be used with loaders that can't deal with incomplete records. We don't believe that this
functionality has been widely used.
  
- Because the slicing of the data is no longer in Pig's control, we can't support this feature
generically for every loader. If a particular loader needs this functionality, it will need
to make sure that the underlying InputFormat supports it. 
+ Because the slicing of the data is no longer in Pig's control, we can't support this feature
generically for every loader. If a particular loader needs this functionality, it will need
to make sure that the underlying InputFormat supports it. (Any !InputFormat based on !FileInputFormat
will support this through the mapred.min.split.size - if this property is set to a value greater
than the size of any of the files to be loaded then each file will be split as a different
split. This property can be provided on the pig command line as a java -D property - note
that this will apply to all jobs that will be run as part of that script.
  
  We will have a different approach for streaming optimization if that functionality is necessary.
  

Mime
View raw message