Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 333F19EDC for ; Sat, 1 Oct 2011 19:20:04 +0000 (UTC) Received: (qmail 51191 invoked by uid 500); 1 Oct 2011 19:20:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 51146 invoked by uid 500); 1 Oct 2011 19:20:00 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 51138 invoked by uid 99); 1 Oct 2011 19:20:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Oct 2011 19:20:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bejoy.hadoop@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Oct 2011 19:19:54 +0000 Received: by qyl38 with SMTP id 38so1351479qyl.14 for ; Sat, 01 Oct 2011 12:19:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=i8xNQEplxUf06FJh/VOioD7mF/Z7qcqYqLVZhOYxgcc=; b=wYmBT1XJB+6DpRhIPi1HzwiMyuIRgE5wDR2xWDBpuKgGejfOnskG8lxCZ64lQ4ybcA G0eCURtQyqU8/eoe3c0tTLZr2wDdXAlLlMDvERNefdI+GikH4hdV2g4gnw1HUvPJMB8X vX7naH6TxVh6PREAXEtOPQUpNWdiobXNkBo7A= MIME-Version: 1.0 Received: by 10.229.40.199 with SMTP id l7mr9790767qce.44.1317496773229; Sat, 01 Oct 2011 12:19:33 -0700 (PDT) Received: by 10.229.148.8 with HTTP; Sat, 1 Oct 2011 12:19:33 -0700 (PDT) In-Reply-To: References: Date: Sun, 2 Oct 2011 00:49:33 +0530 Message-ID: Subject: Re: incremental loads into hadoop From: Bejoy KS To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636833b8a293e1f04ae419f55 --001636833b8a293e1f04ae419f55 Content-Type: text/plain; charset=ISO-8859-1 Sam Try looking into Flume if you need to load incremental data into hdfs . If the source data is present on some JDBC compliant data bases then you can use SQOOP to get in the data directly into hdfs or hive incrementally. For Big Data Aggregation and Analytics Hadoop is definitely a good choice, as you can use Map Reduce or optimized tools on top of map reduce like hive or pig that would cater the purpose very well. So in short for the two steps you can go in with the following 1. Load into hadoop/hdfs - Use Flume or SQOOP as per your source 2. Process within hadoop/hdfs - Use Hive or Pig. These tools are well optimised so go in for a custom map reduce if and only if you feel these tools don't fit into some complex processing. There may be other tools as well to get the source data into hdfs. Let us leave it open for others to comment. Hope It helps. Thanks and Regards Bejoy.K.S On Sat, Oct 1, 2011 at 4:32 AM, Sam Seigal wrote: > Hi, > > I am relatively new to Hadoop and was wondering how to do incremental > loads into HDFS. > > I have a continuous stream of data flowing into a service which is > writing to an OLTP store. Due to the high volume of data, we cannot do > aggregations on the OLTP store, since this starts affecting the write > performance. > > We would like to offload this processing into a Hadoop cluster, mainly > for doing aggregations/analytics. > > The question is how can this continuous stream of data be > incrementally loaded and processed into Hadoop ? > > Thank you, > > Sam > --001636833b8a293e1f04ae419f55--