From common-commits-return-14692-apmail-hadoop-common-commits-archive=hadoop.apache.org@hadoop.apache.org Fri Jul 8 19:40:54 2011 Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25CC96330 for ; Fri, 8 Jul 2011 19:40:54 +0000 (UTC) Received: (qmail 84126 invoked by uid 500); 8 Jul 2011 19:40:53 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 84100 invoked by uid 500); 8 Jul 2011 19:40:53 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 84093 invoked by uid 500); 8 Jul 2011 19:40:52 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 84090 invoked by uid 99); 8 Jul 2011 19:40:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jul 2011 19:40:52 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Jul 2011 19:40:50 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 5F7E8C78; Fri, 8 Jul 2011 19:40:29 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Fri, 08 Jul 2011 19:40:29 -0000 Message-ID: <20110708194029.20393.2673@eos.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Hbase/PoweredBy=22_by_GeorgeStat?= =?utf-8?q?his?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "Hbase/PoweredBy" page has been changed by GeorgeStathis: http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=3Ddiff&rev1=3D71&rev2= =3D72 Comment: Added Traackr = [[http://www.tokenizer.org|Shopping Engine at Tokenizer]] is a web crawle= r; it uses HBase to store URLs and Outlinks (!AnchorText + LinkedURL): more= than a billion. It was initially designed as Nutch-Hadoop extension, then = (due to very specific 'shopping' scenario) moved to SOLR + MySQL(InnoDB) (t= en thousands queries per second), and now - to HBase. HBase is significantl= y faster due to: no need for huge transaction logs, column-oriented design = exactly matches 'lazy' business logic, data compression, !MapReduce support= . Number of mutable 'indexes' (term from RDBMS) significantly reduced due t= o the fact that each 'row::column' structure is physically sorted by 'row'.= MySQL InnoDB engine is best DB choice for highly-concurrent updates. Howev= er, necessity to flash a block of data to harddrive even if we changed only= few bytes is obvious bottleneck. HBase greatly helps: not-so-popular in mo= dern DBMS 'delete-insert', 'mutable primary key', and 'natural primary key'= patterns become a big advantage with HBase. = + [[http://traackr.com/|Traackr]] uses HBase to store and serve online infl= uencer data in real-time. We use MapReduce to frequently re-score our entir= e data set as we keep updating influencer metrics on a daily basis. + = [[http://trendmicro.com/|Trend Micro]] uses HBase as a foundation for clo= ud scale storage for a variety of applications. We have been developing wit= h HBase since version 0.1 and production since version 0.20.0. = [[http://www.twitter.com|Twitter]] runs HBase across its entire Hadoop cl= uster. HBase provides a distributed, read/write backup of all mysql table= s in Twitter's production backend, allowing engineers to run MapReduce jobs= over the data while maintaining the ability to apply periodic row updates = (something that is more difficult to do with vanilla HDFS). A number of ap= plications including people search rely on HBase internally for data genera= tion. Additionally, the operations team uses HBase as a timeseries database= for cluster-wide monitoring/performance data.