Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 60741 invoked from network); 10 Dec 2010 11:52:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Dec 2010 11:52:42 -0000 Received: (qmail 74622 invoked by uid 500); 10 Dec 2010 11:52:41 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 74416 invoked by uid 500); 10 Dec 2010 11:52:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 74408 invoked by uid 99); 10 Dec 2010 11:52:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 11:52:40 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [80.190.178.166] (HELO mail.digital.tis.bz.it) (80.190.178.166) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 11:52:32 +0000 Received: from [192.168.0.2] (host115-111-dynamic.45-79-r.retail.telecomitalia.it [79.45.111.115]) by mail.digital.tis.bz.it (Postfix) with ESMTPSA id 76294123A002 for ; Fri, 10 Dec 2010 12:52:10 +0100 (CET) Message-ID: <4D02146A.3070601@tis.bz.it> Date: Fri, 10 Dec 2010 12:52:10 +0100 From: Claudio Martella User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: Determine in which row a column exists References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org What about a thin table? rowkey:productid columname:clusterid? On 12/10/10 10:52 AM, G=C3=B6khan =C3=87apan wrote: > Hi, > > We have the output of a clustering algorithm in an hbase table which ha= s the > following structure: > > {NAME =3D> 'clusters', FAMILIES =3D> [{NAME =3D> 'products', COMPRESS > true > ION =3D> 'NONE', VERSIONS =3D> '3', TTL =3D> '2147483647', BLOCKSIZE =3D= > > '655 > 36', IN_MEMORY =3D> 'false', BLOCKCACHE =3D> 'true'}]} > > row ids are cluster ids. > Columns in products column family are the id of the products. > > an example row is: > 1-1000936175-1879240683-185 column=3Dproducts:21840054, > timestamp=3D1291817353183, value=3D\x00\x00\x00\x01 > > 1-1000936175-1879240683-185 column=3Dproducts:23194179, > timestamp=3D1291817353183, value=3D\x00\x00\x00\x01 > > 1-1000936175-1879240683-185 column=3Dproducts:23585765, > timestamp=3D1291817353183, value=3D\x00\x00\x00\x01 > > 1-1000936175-1879240683-185 column=3Dproducts:24544087, > timestamp=3D1291817353183, value=3D\x00\x00\x00\x01 > > > > When we want to determine which clusters a product belongs to, we perf= orm a > scan over the table using column, > > e.g. > > Scan s =3D new Scan(); > s.addColumn(Bytes.toBytes("products"), Bytes.toBytes("24659517")); > ResultScanner scanner =3D table.getScanner(s); > > I am not sure this is the best way, it is slow, could you suggest a fas= ter > way to determine such rows? > Is there a secondary index implementation that we can add to a column f= amily > after adding data to table? > --=20 Claudio Martella Digital Technologies Unit Research & Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.martella@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13= of Italian Legislative Decree no. 196 of 30 June 2003, we inform you tha= t we process your personal data in order to fulfil contractual and fiscal= obligations and also to send you information regarding our services and = events. Your personal data are processed with and without electronic mean= s and by respecting data subjects' rights, fundamental freedoms and digni= ty, particularly with regard to confidentiality, personal identity and th= e right to personal data protection. At any time and without formalities = you can write an e-mail to privacy@tis.bz.it in order to object the proce= ssing of your personal data for the purpose of sending advertising materi= als and also to exercise the right to access personal data and other righ= ts referred to in Section 7 of Decree 196/2003. The data controller is TI= S Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can fi= nd the complete information on the web site www.tis.bz.it.