Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB9FE10666 for ; Mon, 3 Mar 2014 10:19:23 +0000 (UTC) Received: (qmail 39830 invoked by uid 500); 3 Mar 2014 10:19:20 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 39438 invoked by uid 500); 3 Mar 2014 10:19:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 39428 invoked by uid 99); 3 Mar 2014 10:19:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 10:19:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.160.52 as permitted sender) Received: from [209.85.160.52] (HELO mail-pb0-f52.google.com) (209.85.160.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 10:19:12 +0000 Received: by mail-pb0-f52.google.com with SMTP id rr13so3571473pbb.25 for ; Mon, 03 Mar 2014 02:18:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=PyNAH2E9ACgHYESokhMjRFhXk/Gk5DGfAIpj8TCaTqE=; b=rAcTQ61k01cRTHGl0mBIeTUdw/FbOvGr6wv0JeZXWEIF+k5s0s8LKyw6NZWXYNuihR YOqzRL/BbgLTXmdTtVm9BvJWIpcY/HVvuC5ZiF1qAB2L6nRf+p4kXSVlC1h//oBcecuK JIMmWbLTakoZW3h9PnQlUJjwoUhu4RbeyTcbigaE7DjhUnEh4tfN5/arVgtWFvmhDRb0 PTOSp0+2/2Hj9XZ8F9ZT8N+qx/aqHSB5YgNcynW+/TkskvE524SnaSxiwgbv/SKMR9o1 G5EZ89GevBflNbcxN1SEWDuAe+zuX3SFwgi8AlB91EY/SuckoXVEKP04Hyzv9fBd45Cx +/CQ== X-Received: by 10.66.164.229 with SMTP id yt5mr18996401pab.67.1393841932159; Mon, 03 Mar 2014 02:18:52 -0800 (PST) Received: from [192.168.0.13] (c-24-130-236-83.hsd1.ca.comcast.net. [24.130.236.83]) by mx.google.com with ESMTPSA id gj9sm34509725pbc.7.2014.03.03.02.18.50 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 03 Mar 2014 02:18:51 -0800 (PST) References: <53142FBC.8050906@infodesk.com> Mime-Version: 1.0 (1.0) In-Reply-To: <53142FBC.8050906@infodesk.com> Content-Type: multipart/alternative; boundary=Apple-Mail-23D2F80A-4A7B-48C3-87F6-3B8AA38EABE7 Content-Transfer-Encoding: 7bit Message-Id: Cc: "user@hbase.apache.org" X-Mailer: iPhone Mail (10B146) From: Ted Yu Subject: Re: HBase Schema for IPTC News ML G2 Date: Mon, 3 Mar 2014 02:18:51 -0800 To: "user@hbase.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-23D2F80A-4A7B-48C3-87F6-3B8AA38EABE7 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable When version is in its own column family, you can utilize essential column f= amily support.=20 See https://issues.apache.org/jira/browse/HBASE-5416 Cheers On Mar 2, 2014, at 11:31 PM, Jigar Shah wrote: > I am working in news processing industry, current system processes more > then million article per week. And provides this data in real time to > users, additionally it provides search capabilities via Lucene. >=20 > We convert all news to a standard IPTC NewsML > G2>format, > before providing it to users (in real-time or via search) >=20 > We have a requirement of component which provides analytical queries on > news data. I plan to load this all data in HBase and then have Map-Reduce > Jobs to compute analytical queries. More over current system is developed > on postgresql to store only 3 months data, anything more then this is big > data as it dosen't fit on one server. >=20 > But i am bit confused in developing schema for it. >=20 > Every news article has >=20 > *"messageID" as guid*, unique id for news message. > *"version" as int,* incremented if newer version of same news message is p= ublished. > there are other fields like location, channels, title, content, source etc= .. >=20 > Current database primary key is a composite of (messageID & version). >=20 > I thought that, i should use "messageID" as "rowKey" in HBase. and > "version" as "columnFamily" and all columns will be fields of news (like l= ocation, channels ,title, body, sentTimstamp, ...) >=20 > Keeping "version" as "columnFamily" is a good idea ? >=20 > In reality "single message may have thousands of version". >=20 > Or if any other solution when we have composite primary key in database. --Apple-Mail-23D2F80A-4A7B-48C3-87F6-3B8AA38EABE7--