hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan A. Pendleton" ...@geekdom.net>
Subject Re: [jira] Commented: (HADOOP-451) Add a Split interface
Date Mon, 14 Aug 2006 23:45:55 GMT
Maybe I'm reading the code wrong.

It looks to me like the FileSplit that gets used by a MapTaskRunner comes
from the instantiation of a FileSplit in MapTask.java, which deserializes it
from a stream. In the very least, if MapTasks stay Writable, then that code
is wrong. I got the impression that the Task which ends up getting executed
goes through RPC this way. So, when a MapTask gets assigned, and the
readFields() gets called on the other end, it will be FileSplit.readFields()
getting called, with no awareness of the actual type of Split that was
present.

Now that I've run through that process, and assuming my spiel is correct -
then won't it be necessary to fix MapTask to correctly re-create the right
type of Split on deserialization?

On 8/14/06, Doug Cutting (JIRA) <jira@apache.org> wrote:
>
>     [
> http://issues.apache.org/jira/browse/HADOOP-451?page=comments#action_12427953]
>
> Doug Cutting commented on HADOOP-451:
> -------------------------------------
>
> > You will probably need to add a "setSplitClass" and "getSplitClass" to
> JobConf
>
> The InputFormat is a Split factory, no?  Who else would need to create
> Splits?
>
> Splits are passed in RPCs, and the RPC mechanism supports
> polymorphism.  Splits are not written to files, so I see no reason to
> declare a fixed class per job.  Am I missing something?
>
> > Add a Split interface
> > ---------------------
> >
> >                 Key: HADOOP-451
> >                 URL: http://issues.apache.org/jira/browse/HADOOP-451
> >             Project: Hadoop
> >          Issue Type: Improvement
> >          Components: mapred
> >            Reporter: Doug Cutting
> >             Fix For: 0.6.0
> >
> >
> > The InputFormat interface has a method:
> > FileSplit[] getSplits();
> > This should change to:
> > Split[] getSplits();
> > The Split interface would look like:
> > public interface Split extends Writable {
> >   /** Returns a list of hosts that contain this split.
> >        This is only used to optimize task placement, so this may be
> empty. */
> >   String[] getLocations(FileSystem fs);
> >   /** The relative, estimated cost of operating on this.  Typically the
> size of the data in the split.
> >        Used to prioritize tasks in a job (high-cost tasks are run
> first).  */
> >    long getCost();
> > }
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>


-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message