From user-return-1140-apmail-hadoop-user-archive=hadoop.apache.org@hadoop.apache.org Wed Sep 5 16:14:04 2012 Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71CF8D01F for ; Wed, 5 Sep 2012 16:14:04 +0000 (UTC) Received: (qmail 16834 invoked by uid 500); 5 Sep 2012 16:13:59 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 16760 invoked by uid 500); 5 Sep 2012 16:13:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 16752 invoked by uid 99); 5 Sep 2012 16:13:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 16:13:59 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bld@otfrom.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Sep 2012 16:13:52 +0000 Received: by qady1 with SMTP id y1so1020894qad.14 for ; Wed, 05 Sep 2012 09:13:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:from:date :message-id:subject:to:content-type:x-gm-message-state; bh=vK5aTspGAW1DzQOZmgzI23F2KC/pd0B+awb1I2oYZJY=; b=cxw3Q2yRq1Kjuu+BH7Rqr3QVN5/hxrbKhXSP90kAlcMOuheeexpeUf5FCsKbswyNTI gJDIwJTb0p1i0EgpDIeB5ylo5IB0lLQa8KWZZcQcyWw6btgUMaItsLwycFudweQP+vCu upkZWBwx2OqtinGXiZ/hzLMKH+Y6L4+1xyA43zyvAJr6oCSK3wLT61teeU3jGNdoqnc6 bE+R3U07wvEwHm0iqOy8sro92ApI+S6uqEr8w79vVIxTWaRKtoumuQcRoXz7sLyir3S4 rwI1kEnFiHD7olHZeuThaka2iIlBhP19C8gqYVg2aYTV5ikQPLrinx2pod9rS9Wm9njz pSxA== Received: by 10.224.207.71 with SMTP id fx7mr46010036qab.12.1346861611396; Wed, 05 Sep 2012 09:13:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.29.137 with HTTP; Wed, 5 Sep 2012 09:13:10 -0700 (PDT) X-Originating-IP: [74.125.61.190] In-Reply-To: <504779B0.3000206@mail.ntua.gr> References: <504779B0.3000206@mail.ntua.gr> From: Bruce Durling Date: Wed, 5 Sep 2012 17:13:10 +0100 Message-ID: Subject: Re: access patterns investigation to dynamically toggle the replication factor in a hadoop cluster To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQkuxefyuo+HuIk+0UtBWpFlmTbOv175R8kkZSETrNXJ+XLQpYdLioNOrQQ/qltQ0FpbQ1tY I find this interesting. If this isn't the place to pursue it then I'd be interested in subscribing to that mailing list. :-D cheers, Bruce On Wed, Sep 5, 2012 at 5:11 PM, George Kousiouris wrote: > > Hi all, > > As part of the research for an ongoing project, we are interested in > investigating the ability to predict data access patterns on a hadoop > cluster. The purpose is to study the file access patterns (in a time series > manner), so that proactive manipulation of data may be achieved. This for > example may involve the increase/decrease of the replication factor in an > Apache Hadoop cluster (and according HDFS) to deal with an upcoming > predicted increase/decrease of data accesses. > > So we would like your advise on some issues: > 1) is this the correct mailing list? :) > 2) would a changed replication factor translate to a better performance of a > MR job (either by experience you may have or if you have in mind a > report/paper etc. that has studied this) > 3) do you find this interesting in general and something we should pursue? > 4) are you aware of any related work on the topic we could use as a starting > point? > > Thanks for your help, > George > -- @otfrom | CTO & co-founder @MastodonC | mastodonc.com