ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Veselovsky (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (IGNITE-4097) Spilled map-reduce: map side.
Date Thu, 20 Oct 2016 12:55:58 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589204#comment-15589204
] 

Ivan Veselovsky edited comment on IGNITE-4097 at 10/20/16 12:55 PM:
--------------------------------------------------------------------

Reasons to get rid of offheap collections in favor of on-heap collections:
1) To use RawComparator to compare objects we need two byte[] arrays , but it is expensive
to fetch them from offheap memory. But in case of on-heap collection byte[] arrays are always
handy.
2) SkipList datastructure efficiency is O(N) in worst case, it may make sense to use collection
with guaranteed log(N) worst case efficiency. 

Simple Map-like collection has more simple interface, like just put(K,V) . Now {code}HadoopMultimap{code}
has complex interface part related to serialized data reading (inner interfaces Adder, Key,
Value). 
 


was (Author: iveselovskiy):
To use RawComparator to compare objects we need two byte[] arrays , but it is expensive to
fetch them from offheap memory. So, the question raises, if we should re-implement the sorting
collection to an on-heap solution to use RawComparator effectively. 

> Spilled map-reduce: map side.
> -----------------------------
>
>                 Key: IGNITE-4097
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4097
>             Project: Ignite
>          Issue Type: Sub-task
>          Components: hadoop
>    Affects Versions: 1.6
>            Reporter: Ivan Veselovsky
>            Assignee: Ivan Veselovsky
>             Fix For: 1.9
>
>
> Implement spilled output on Map side of map-reduce.
> In general, algorithm should follow the one used in Hadoop. The difference on the Map
side is that 
> 1) we use sorting collection (Hadoop sorts a range of map outputs explicitly);
> 2) we store the map output in files not using FileSystem , but rather local files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message