Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1876F7BD7 for ; Sun, 4 Sep 2011 04:39:34 +0000 (UTC) Received: (qmail 79692 invoked by uid 500); 4 Sep 2011 04:39:31 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 79178 invoked by uid 500); 4 Sep 2011 04:39:22 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 79170 invoked by uid 99); 4 Sep 2011 04:39:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Sep 2011 04:39:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of whshub@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 Sep 2011 04:39:11 +0000 Received: by fxg9 with SMTP id 9so3626081fxg.14 for ; Sat, 03 Sep 2011 21:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Ami4DflefkLccvYv9bcq4g0i5PwwV1NNlONxxkEddlw=; b=PWn1qOFJ21jS8YBnzIQ2G42vRTcEjgw2Dgcyx3+QTWE+NhyYfm6VJKtXDohD5sm2dy NlLciNmZTRKpgbETGRFITopBeRW88d854vKVth7dNcS0hgQGQJrKoQ8OZab25Qabyts+ gPyyisl4XmQy+83wwFTiqIP17+L+5kyUT77vM= MIME-Version: 1.0 Received: by 10.223.29.82 with SMTP id p18mr4889451fac.44.1315111129659; Sat, 03 Sep 2011 21:38:49 -0700 (PDT) Received: by 10.223.89.130 with HTTP; Sat, 3 Sep 2011 21:38:49 -0700 (PDT) In-Reply-To: <1315105579.29503.YahooMailNeo@web59416.mail.ac4.yahoo.com> References: <4E62B56A.2000301@dehora.net> <1315105579.29503.YahooMailNeo@web59416.mail.ac4.yahoo.com> Date: Sat, 3 Sep 2011 21:38:49 -0700 Message-ID: Subject: Re: Hadoop real time From: Jacques To: dev@hbase.apache.org, Aditya Kumar Content-Type: multipart/alternative; boundary=00151747b9eab93a0904ac162b31 --00151747b9eab93a0904ac162b31 Content-Type: text/plain; charset=ISO-8859-1 It is hard to reply to an article that you don't actually reference but I'll do my best. Also, you don't define real-time so I'll consider it as being something that would come back within 1-2 seconds (e.g. an end user on a web site is waiting for the info). >>Can you please tell me why Hadoop is said not to be used for Real time processing of data? There are two different parts to the core Hadoop project. Both of these are focused more on batch processing by themselves as opposed to real time workflows. 1. HDFS, a distributed file system that is good at safely managing a large quantity of very large files. Generally speaking, Hadoop is a write once file system. You can't modify the middle of a file after it is written. You also can't append to the end of a file without a special version of Hadoop. Also, you can't tail a file directly as it is being written. As such, it would be hard to use it directly to create a real-time work flow. 2. MapReduce is a distributed computing framework. It is used to process those large files held on HDFS. Because of the design of MapReduce, jobs usually take at least 10 seconds and typically much longer. This would also mean you're looking at batch processing large quantities of data in some non-real-time period. HBase, is a separate, sub-project from the Hadoop project proper. It is built specifically to handle real time loads. You can insert a row and get it back immediately. >I was thinking we can replace the DB with Hadoop...I do not see any issue? HBase can replace many of the functions of existing databases but should be used primarily when you need the massive scale it can provide. You have to give up things like transactions and SQL to HBase when compared to traditional RDBMS's (Mysql, PostreSQL, etc). The schema design is very different and generally your application must be built with this in mind. You should probably spend some time with the HBase book ( http://hbase.apache.org/book.html) and looking at your current applications to determine what kinds of things you would need to do. Many people actually use HBase in parallel with a traditional RDBMS, leveraging the strengths of each. Good luck! --00151747b9eab93a0904ac162b31--