pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Zhou (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1535) Combined input splits need to consider rack-locality for the underlying splits of rack info.
Date Wed, 04 Aug 2010 17:51:16 GMT
Combined input splits need to consider rack-locality for the underlying splits of rack info.
--------------------------------------------------------------------------------------------

                 Key: PIG-1535
                 URL: https://issues.apache.org/jira/browse/PIG-1535
             Project: Pig
          Issue Type: Improvement
            Reporter: Yan Zhou


PIG-1518 will add support to incorporate multiple small splits into bigger yet less splits.
In doing so, the underlying generic input split's node-locality is consulted  to maximize
the data node-locality for the "big" splits. The rack-locality info is unavailable because
the generic input splits do not have the info currently. MAPREDUCE-1698 is filed to address
the lack of rack info in InputSplit. On the other hand, for many other types of input splits
the rack info is available. FileSplit is an example. Future Howl's input splits will also
contain the rack-locality info. 

In summary, before MAPREDUCE-1698 is resolved if ever, for some specific types of input splits,
the small splits could be combined with the awareness of the rack-locality, by, probably,
the same or similar algorithms by the CombineFileInputFormat.

But it would mean non-trivial extra work on top of PIG-1518 and may be out of reach of 0.8,
hence a separate JIRA.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message