Return-Path: X-Original-To: apmail-hadoop-common-commits-archive@www.apache.org Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B3639EBD for ; Mon, 2 Jul 2012 20:36:35 +0000 (UTC) Received: (qmail 16946 invoked by uid 500); 2 Jul 2012 20:36:35 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 16744 invoked by uid 500); 2 Jul 2012 20:36:35 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 16735 invoked by uid 500); 2 Jul 2012 20:36:35 -0000 Delivered-To: apmail-hadoop-core-commits@hadoop.apache.org Received: (qmail 16731 invoked by uid 99); 2 Jul 2012 20:36:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 20:36:35 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 20:36:32 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 1BD32179; Mon, 2 Jul 2012 20:36:12 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Mon, 02 Jul 2012 20:36:11 -0000 Message-ID: <20120702203611.30892.23458@eos.apache.org> Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22Hbase/PoweredBy=22_by_stack?= Auto-Submitted: auto-generated Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for ch= ange notification. The "Hbase/PoweredBy" page has been changed by stack: http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=3Ddiff&rev1=3D75&rev2= =3D76 = [[http://www.openplaces.org|Openplaces]] is a search engine for travel th= at uses HBase to store terabytes of web pages and travel-related entity rec= ords (countries, cities, hotels, etc.). We have dozens of MapReduce jobs th= at crunch data on a daily basis. We use a 20-node cluster for development,= a 40-node cluster for offline production processing and an EC2 cluster for= the live web site. = - [[http://www.powerset.com/|Powerset (a Microsoft company)]] uses HBase to= store raw documents. We have a ~110 node hadoop cluster running DFS, mapr= educe, and hbase. In our wikipedia hbase table, we have one row for each w= ikipedia page (~2.5M pages and climbing). We use this as input to our inde= xing jobs, which are run in hadoop mapreduce. Uploading the entire wikiped= ia dump to our cluster takes a couple hours. Scanning the table inside map= reduce is very fast -- the latency is in the noise compared to everything e= lse we do. + [[http://www.pnl.gov|Pacific Northwest National Laboratory]] - Hadoop and= HBase (Cloudera distribution) are being used within PNNL's Computational B= iology & Bioinformatics Group for a systems biology data warehouse project = that integrates high throughput proteomics and transcriptomics data sets co= ming from instruments in the Environmental Molecular Sciences Laboratory, = a US Department of Energy national user facility located at PNNL. The data = sets are being merged and annotated with other public genomics information = in the data warehouse environment, with Hadoop analysis programs operating = on the annotated data in the HBase tables. This work is hosted by olympus, = a large PNNL institutional computing cluster (http://www.pnl.gov/news/relea= se.aspx?id=3D908) , with the HBase tables being stored in olympus's Lustre = file system. = [[http://www.readpath.com/|ReadPath]] uses HBase to store several hundred= million RSS items and dictionary for its RSS newsreader. Readpath is curre= ntly running on an 8 node cluster. =20