Return-Path: X-Original-To: apmail-giraph-dev-archive@www.apache.org Delivered-To: apmail-giraph-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 287ADDADE for ; Fri, 3 Aug 2012 13:50:46 +0000 (UTC) Received: (qmail 98610 invoked by uid 500); 3 Aug 2012 13:50:46 -0000 Delivered-To: apmail-giraph-dev-archive@giraph.apache.org Received: (qmail 98561 invoked by uid 500); 3 Aug 2012 13:50:45 -0000 Mailing-List: contact dev-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@giraph.apache.org Delivered-To: mailing list dev@giraph.apache.org Received: (qmail 98552 invoked by uid 99); 3 Aug 2012 13:50:45 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 13:50:45 +0000 Received: from localhost (HELO mail-lpp01m010-f52.google.com) (127.0.0.1) (smtp-auth username gdfm, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 13:50:44 +0000 Received: by lahj13 with SMTP id j13so357386lah.11 for ; Fri, 03 Aug 2012 06:50:43 -0700 (PDT) Received: by 10.152.111.200 with SMTP id ik8mr1287599lab.15.1343991856988; Fri, 03 Aug 2012 04:04:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.58.146 with HTTP; Fri, 3 Aug 2012 04:03:36 -0700 (PDT) In-Reply-To: References: <501A3317.1040503@gmail.com> From: Gianmarco De Francisci Morales Date: Fri, 3 Aug 2012 13:03:36 +0200 Message-ID: Subject: Re: Review Request: Out-of-core messages To: dev@giraph.apache.org Content-Type: multipart/alternative; boundary=f46d0408913137832f04c65a7d63 --f46d0408913137832f04c65a7d63 Content-Type: text/plain; charset=ISO-8859-1 Hi, >Are you saying that out-of-core is faster that hitting memory boundaries > >(i.e. GC)? It is a bit tough to imagine that out-of-core beats in-core > >=). > > That's the only explanation I could think of, honestly it sounds wrong to > me too. But those are the results I keep getting. If someone has a better > one I'd love to hear it :-) I am not surprised. Streaming sequentially from a disk is faster than random reading from memory [1]. Add the GC overhead, and you get an explanation for your results. [1] The Pathologies of Big Data, http://queue.acm.org/detail.cfm?id=1563874 Cheers, -- Gianmarco --f46d0408913137832f04c65a7d63--