Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B2E7710D49 for ; Tue, 3 Sep 2013 19:31:09 +0000 (UTC) Received: (qmail 32524 invoked by uid 500); 3 Sep 2013 19:31:09 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 32177 invoked by uid 500); 3 Sep 2013 19:31:04 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 32165 invoked by uid 99); 3 Sep 2013 19:31:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 19:31:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown (athena.apache.org: error in processing during lookup of kumbhare@usc.edu) Received: from [209.85.220.52] (HELO mail-pa0-f52.google.com) (209.85.220.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 19:30:57 +0000 Received: by mail-pa0-f52.google.com with SMTP id kq13so6843998pab.11 for ; Tue, 03 Sep 2013 12:30:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=qbiUmRUh4AkuzhKLkaPcgf4qwXI2Mhg96Ev4wDSy4vs=; b=kbieqOa/kP3qbV6biKEjIKg8qQu7h/yh5NFboRq58cvBCioyULfwPZSYen5t341GiY KJTnuALNSV/kwfqx6MznKbklRk2cCz1kSmVcpPQkZibZ0EHiqZgKz84LvmQU4QCEp/0R ijpuFWpIVt78BV40qNClxhZ+KuqA9v0W4Fy1jxmI95/HzlEhDodXieCw5VBLhGAgXass +DK1oAMST3YBSoIlpSl2PJmt+NcenAeOFN9AYFv8XfHlLI9VdB3kOaGPi1k/bJwTaLnS plim4LXxVhZ9SbrceKAcuOuKUUejTxlYOiER58mqYdOjn0JDCp8iijKAcbFWfTE0rNMZ 1nbQ== X-Gm-Message-State: ALoCoQllbl3vqyCGBhwmeFwzYA9G/M5A3JJpy/9Hv94TklWWBsY5lbGTv/iWeDGNMkcX+ZleOBaj MIME-Version: 1.0 X-Received: by 10.68.228.230 with SMTP id sl6mr32741793pbc.98.1378236636909; Tue, 03 Sep 2013 12:30:36 -0700 (PDT) Received: by 10.68.138.198 with HTTP; Tue, 3 Sep 2013 12:30:36 -0700 (PDT) Date: Tue, 3 Sep 2013 12:30:36 -0700 Message-ID: Subject: Benchmarking Giraph From: Alok Kumbhare To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=047d7b1632e928e62c04e57fb91c X-Virus-Checked: Checked by ClamAV on apache.org --047d7b1632e928e62c04e57fb91c Content-Type: text/plain; charset=ISO-8859-1 Hi, We are looking into some giraph benchmarks to compare against a similar programming model and framework we are working on. As a start we are planning to benchmark the following algorithms on data sets with more than a billion edges. 1. Single Source Shortest Path from a given source 2. Page Rank 3. Connected Components We have a small cluster of 16 nodes (8 core/16 gb each) to run the benchmarks. Given that we have a few questions to help us get the best out of giraph. 1. Which version of giraph should we use to take advantage of the optimizations in terms of memory optimization/caching, multi-threading etc. mentioned here https://www.facebook.com/notes/facebook-engineering/scaling-apache-giraph-to-a-trillion-edges/10151617006153920? 1.0 or trunk? 2. Are the samples present in the giraph distribution for the above algorithms a good place to start? How can we take advantage of different optimizations, including aggregators/combiners for these algorithms? 3. Is there a document i can look at to understand the best practices for implementing optimized vertex-centric code using the latest features and deployment guidelines to maximize utilization. Looking forward to your help. Thanks, Alok Kumbhare --047d7b1632e928e62c04e57fb91c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,
We are looking into some giraph benchmarks to comp= are against a similar programming model and framework we are working on.

As a start we are planning to benchmark the followin= g algorithms on data sets with more than a billion edges.=A0

1. Single Source Shortest Path from a given source
2. Page Rank
3. Connected Components

We have a small cluster of 16 nodes (8 core/16 gb each) to run the bench= marks. Given that we have a few questions to help us get the best out of gi= raph.

1. Which version of giraph should we use to take advant= age of the optimizations in terms of memory optimization/caching, multi-thr= eading etc. mentioned here=A0https://www.facebook.com/notes/facebook-engineering/sc= aling-apache-giraph-to-a-trillion-edges/10151617006153920? 1.0 or trunk= ?=A0

2. Are the samples present in the giraph distribution f= or the above algorithms a good place to start? How can we take advantage of= different optimizations, including aggregators/combiners for these algorit= hms?

3. Is there a document i can look at to understand the = best practices for implementing optimized vertex-centric code using the lat= est features and deployment guidelines to maximize utilization.

Looking forward to your help.

Thank= s,
Alok Kumbhare
--047d7b1632e928e62c04e57fb91c--