Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 49809 invoked from network); 26 Jan 2011 03:50:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Jan 2011 03:50:38 -0000 Received: (qmail 45512 invoked by uid 500); 26 Jan 2011 03:50:37 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 45273 invoked by uid 500); 26 Jan 2011 03:50:32 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 45265 invoked by uid 99); 26 Jan 2011 03:50:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 03:50:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vermaabhishekp@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jan 2011 03:50:24 +0000 Received: by wwd20 with SMTP id 20so520370wwd.29 for ; Tue, 25 Jan 2011 19:50:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:date:x-google-sender-auth :message-id:subject:from:to:content-type; bh=MsZs3fNpcTLI91Jmzs45bFrFPSRR41ayUitUOVZHpeE=; b=jTfivHouTX8dEbjQ7uadeAYzKVuNXOvCvoaiBpZxDaI9csyPiDMxFzxlyOZff17ykO kIuBjZ6kvjE9+MYAPeqdOTmvKaZ+hyLAoV7+KQbWDfRsDCSslcVWJCr4LachkO1U3cDU KKmvdxJODfzqv6mR2mt4jGjKEUBxYt9VM63dk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=XxGI6lK/cjLSiGhbrkSC450S5b/wsmtbNkb2ZHbZVcDqEFb8PhE8PoEliYGpOVdtqL SRKsU9h7IJayTXi0iZZ7lCqU8b802PwaE0MD2UbIo3dOZ6BVquogR9s5BEdVKbq+uI/M K18gyweMcpuZP1tGn7QeuBRbzM5ZU/VjofN9Y= MIME-Version: 1.0 Received: by 10.216.143.2 with SMTP id k2mr659892wej.66.1296013803704; Tue, 25 Jan 2011 19:50:03 -0800 (PST) Sender: vermaabhishekp@gmail.com Received: by 10.216.246.68 with HTTP; Tue, 25 Jan 2011 19:50:03 -0800 (PST) Date: Tue, 25 Jan 2011 21:50:03 -0600 X-Google-Sender-Auth: qd0sbrKBqPopUBScBjHXiU6OqCE Message-ID: Subject: Hadoop job logs for research From: Abhishek Verma To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6deddc864d12f049ab7ba22 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6deddc864d12f049ab7ba22 Content-Type: text/plain; charset=ISO-8859-1 Hi fellow Hadoop users and developers, I am a third year PhD student at the University of Illinois and am working on improving workload management and scheduling in Hadoop. I have tested some of my ideas on synthetic workloads, GridMix, hadoop-examples and a few of my own applications. I am looking for real workloads that are executed in the industry. Specifically, I am interested in the job logs (stored by default on the JobTracker) of real workloads. If people are concerned about the confidentiality of the application, I would like to mention that these logs contain very little information about the processed data or the application itself. Anonymizing the job names (and their submission times, etc.) would not be too much of a problem. I would love to collaborate with folks from the industry in understanding these workloads. I sincerely hope that the research that I am conducting will benefit everybody. Thanks a lot. -Abhishek. --0016e6deddc864d12f049ab7ba22--