giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching" <avery.ch...@gmail.com>
Subject Re: Review Request 13909: GIRAPH-752: Better support for supernodes
Date Tue, 03 Sep 2013 17:07:39 GMT


> On Aug. 30, 2013, 11:33 p.m., Avery Ching wrote:
> > giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java, line 45
> > <https://reviews.apache.org/r/13909/diff/1/?file=346572#file346572line45>
> >
> >     Should this be bigger than 32 MB?  If we are hitting the 2 GB barrier, then
we will have 64 buffers just to get to 2 GB.  Maybe 64 MB?  Would this help reduce the overhead?
> 
> Maja Kabiljo wrote:
>     I don't believe that having that few buffers comparing to their size can add any
visible overhead. I think that the overhead comes because we have to do the checks all the
time. With one application which is using a lot of memory I tried 256MB chunks and it was
crashing, while 32MB run fine.

Sounds good.


- Avery


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13909/#review25811
-----------------------------------------------------------


On Sept. 2, 2013, 6:03 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13909/
> -----------------------------------------------------------
> 
> (Updated Sept. 2, 2013, 6:03 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-752
>     https://issues.apache.org/jira/browse/GIRAPH-752
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> We've seen before that we crash when we have a vertex which receives a lot of messages
and we don't use a combiner. That is because the total size of serialized messages for that
vertex is bigger than the allowed size of an array.
> We should implement OutputStream which can handle arbitrary size of data and add an option
to use that kind of stream for messages.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
6518da6 
>   giraph-core/src/main/java/org/apache/giraph/comm/messages/MessagesIterable.java a466a8d

>   giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/PartitionDiskBackedMessageStore.java
7b3e548 
>   giraph-core/src/main/java/org/apache/giraph/comm/messages/out_of_core/SequentialFileMessageStore.java
64031c3 
>   giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/IntByteArrayMessageStore.java
597e7af 
>   giraph-core/src/main/java/org/apache/giraph/comm/messages/primitives/LongByteArrayMessageStore.java
3fe6356 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 604729a 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
2506c21 
>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java cf2c187 
>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java 76ed789 
>   giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java 56cc01c

>   giraph-core/src/main/java/org/apache/giraph/utils/Factory.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java
e3992ed 
>   giraph-core/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java
b6151c5 
>   giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInput.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataInputOutput.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/io/BigDataOutput.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/io/DataInputOutput.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/io/ExtendedDataInputOutput.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/utils/io/package-info.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/13909/diff/
> 
> 
> Testing
> -------
> 
> Run a job which fails with original code and when the new option is not used, and verified
it works properly when the job is used. 
> Also compared the performance with and without the change, it's the same, when option
is turned on it seems to add about 5% overhead.
> mvn clean verify
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message