Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@giraph.apache.org
MIME-Version: 1.0
In-Reply-To: <CC3FF247.18D6%majakabiljo@fb.com>
References: <501A3317.1040503@gmail.com> <CC3FF247.18D6%majakabiljo@fb.com>
From: Gianmarco De Francisci Morales <gdfm@apache.org>
Date: Fri, 3 Aug 2012 13:03:36 +0200
Message-ID: 
 <CAGD7CYUkF0PA2N1Ni0Fn1UPPsHQbK-eHzv=sOM-uUWX1V8Z=Dg@mail.gmail.com>
Subject: Re: Review Request: Out-of-core messages
To: dev@giraph.apache.org
Content-Type: multipart/alternative; boundary=f46d0408913137832f04c65a7d63

--f46d0408913137832f04c65a7d63
Content-Type: text/plain; charset=ISO-8859-1

Hi,

>Are you saying that out-of-core is faster that hitting memory boundaries
> >(i.e. GC)?  It is a bit tough to imagine that out-of-core beats in-core
> >=).
>
> That's the only explanation I could think of, honestly it sounds wrong to
> me too. But those are the results I keep getting. If someone has a better
> one I'd love to hear it :-)


I am not surprised.
Streaming sequentially from a disk is faster than random reading from
memory [1].
Add the GC overhead, and you get an explanation for your results.

[1] The Pathologies of Big Data, http://queue.acm.org/detail.cfm?id=1563874

Cheers,
--
Gianmarco

--f46d0408913137832f04c65a7d63--