giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitay Joffe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-483) InputSplit needs to be Writable
Date Sat, 19 Jan 2013 06:10:12 GMT
Nitay Joffe created GIRAPH-483:
----------------------------------

             Summary: InputSplit needs to be Writable
                 Key: GIRAPH-483
                 URL: https://issues.apache.org/jira/browse/GIRAPH-483
             Project: Giraph
          Issue Type: Improvement
            Reporter: Nitay Joffe
            Priority: Minor


Working on Hive I/O recently I found this out the hard way...
We use InputSplit in Giraph in order to make things work easily with Hadoop. However our usage
of the interface is not actually consistent. Specifically, in InputSplitsCallable#getInputSplit
we have the following:

  ((Writable) inputSplit).readFields(inputStream);

This means our InputSplit has to be Writable. If it's not (as mine wasn't initially when implementing
a new input format) things break badly. For a simple start we should at least put some instanceof
check around that cast and an informative error message.

Furthermore, looking deeper into it I noticed we don't actually ever use the getLength() method
in InputSplit, just getLocations(). So really the "right" way to have things IMO is to have
our own GiraphInputSplit interface, which extends Writable, and has the getLocations() method.

Doing this is tricky though as it will likely break existing I/O formats, so will require
some care...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message