Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 50728 invoked from network); 29 Nov 2010 18:35:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Nov 2010 18:35:42 -0000 Received: (qmail 37039 invoked by uid 500); 29 Nov 2010 18:35:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 36996 invoked by uid 500); 29 Nov 2010 18:35:40 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 36928 invoked by uid 99); 29 Nov 2010 18:35:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Nov 2010 18:35:40 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saint.ack@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Nov 2010 18:35:33 +0000 Received: by fxm13 with SMTP id 13so2672918fxm.14 for ; Mon, 29 Nov 2010 10:35:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=Yd+oWf+143zfn1GgvjWYcjVNjMb0GpP2iu3H33ixw0o=; b=sl+Rmia4KI0cECwSu0bzbZyGMocDuDXlaOuVcr34zEPOQAfPr/dg2exxpjF6It9nls YpvX+CptT1WPmS0X/vVqXya+i9pMS/EmRsbWqIqADkNWLtQNByAHiuLCHKvPkWiierLH 9xRtFO8eh3BuS28aypdo0qMdzyDglzPPdn2xY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=FaHHLOOoEvW6/ROc4Y5irxynAGTiLbgnqdAp8zzPrBZ+7iq2T7srJ/lHb+K1HVRe0P rkfaSW8dBA4S4uuuWWmSekyG2refYy3pQ3DTo85AQwSfyYt6d4+P0+Kj2e2EIrbTRbvK 4Jp0jP3sTzjXChkxgchYnximi6VsjYU3q8RM4= MIME-Version: 1.0 Received: by 10.223.107.141 with SMTP id b13mr1914060fap.86.1291055712917; Mon, 29 Nov 2010 10:35:12 -0800 (PST) Sender: saint.ack@gmail.com Received: by 10.223.83.202 with HTTP; Mon, 29 Nov 2010 10:35:12 -0800 (PST) In-Reply-To: <4CF3EAFD.6020200@tis.bz.it> References: <4CF3C2E0.2070804@tis.bz.it> <4CF3EAFD.6020200@tis.bz.it> Date: Mon, 29 Nov 2010 10:35:12 -0800 X-Google-Sender-Auth: 7ad-2wruN5r22-HMPyE-WEO3GuM Message-ID: Subject: Re: incremental counters and a global String->Long Dictionary From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org You might try http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hb= ase/client/HTable.html#checkAndPut(byte[], byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put) St.Ack On Mon, Nov 29, 2010 at 10:03 AM, Claudio Martella wrote: > Hi Lars, > > thanks for you answer. Yes, I read Percolator's paper, but I'd like to > get my problem solved with existing software solution, and i like HBase. > The ephemeral node is, i think, my last solution i proposed, the one i > called ZKsafe_insert(). Or? > > On 11/29/10 6:35 PM, Lars George wrote: >> Hi Claudio, >> >> Did you have a look at Google's Percolator paper? I think a mechanism li= ke this may work. Another option often used to implement distributed transa= ctions is using Zookeeper where you could create an ephemeral node on the n= ew word and the host succeeding to do so is adding it and then releasing th= e lock. Or some such. >> >> Lars >> >> On Nov 29, 2010, at 16:12, Claudio Martella = wrote: >> >>> Hello list, >>> >>> I'm kind of new to HBase, so I'll post this email with a request for >>> comment. >>> Very briefly, I do a lot of text processing with mapreduce, so it's ver= y >>> useful for me to convert string to longs, so i can make my computations >>> faster. >>> >>> My corpus keeps on growing and I want this String->Long mapping to be >>> persistent and dynamical (i want to add new mappings when i find new wo= rds). >>> At the moment i'm tackling the problem this way (pseudo-code): >>> >>> longvalue =3D convert(word) # gets from hbase >>> if longvalue =3D=3D -1: >>> =A0 =A0longvalue =3D insert(word) # puts in hbase >>> >>> longvalue now contains the new mapped value. This approach requires a >>> global counter that saves the latest mapped long and increments at ever= y >>> insert. I can easily do this two ways. A special row in hbase "_counter= " >>> that I increment through IncrementColumnValue, or creating a sequential >>> non-ephemeral znode in zookeeper and use the version as my counter. The >>> first one is of course faster. So the solution would be: >>> >>> insert(word): >>> =A0 =A0longvalue =3D hbase.incrementColumnValue("_counter", "v") >>> =A0 =A0hbase.put(word, longvalue) >>> =A0 =A0return longvalue >>> >>> The problem is that between the time i realize there's no mapping for m= y >>> word and the time i insert the new longvalue, somebody else might have >>> done the same for me, so I have a corrupted dictionary. >>> >>> One possible solution would be to acquire a lock on the "_counter" row, >>> recheck for the presence of the mapping and then insert my new value: >>> >>> safe_insert(word): >>> =A0 =A0lock("_counter") >>> =A0 =A0longvalue =3D convert(word) >>> =A0 =A0if longvalue =3D=3D -1: #nobody inserted the mapping in the mean= time >>> =A0 =A0 =A0 =A0longvalue =3D insert(word) >>> =A0 =A0unlock("_counter") >>> =A0 =A0return longvalue >>> >>> This way the counter row, with its lock, would behave as a global lock. >>> This would solve my problems but would create a bottleneck (although >>> with time my inserts tend to get very rare as the dictionary grows). A >>> solution to this problem would be to have locks on zookeeper based on w= ords. >>> >>> ZKsafe_insert(word): >>> =A0 =A0ZKlock("/words/"+ word) >>> =A0 =A0longvalue =3D convert(word) >>> =A0 =A0if longvalue =3D=3D -1: #nobody inserted the mapping in the mean= time >>> =A0 =A0 =A0 =A0longvalue =3D insert(word) >>> =A0 =A0ZKunlock("/words/"+word) >>> =A0 =A0return longvalue >>> >>> This of course would allow me to have more finegrained locks and better >>> scalability, but I'd relay on a system with higher latency (ZK). >>> >>> Does anybody have a better solution with hbase? I guess using >>> hbase_transational would also be a possibility, but again, what about >>> speed and the actual issues with the package (like recovering in the >>> face of hregion failure). >>> >>> >>> Thank you, >>> >>> Claudio >>> >>> -- >>> Claudio Martella >>> Digital Technologies >>> Unit Research & Development - Analyst >>> >>> TIS innovation park >>> Via Siemens 19 | Siemensstr. 19 >>> 39100 Bolzano | 39100 Bozen >>> Tel. +39 0471 068 123 >>> Fax =A0+39 0471 068 129 >>> claudio.martella@tis.bz.it http://www.tis.bz.it >>> >>> Short information regarding use of personal data. According to Section = 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you tha= t we process your personal data in order to fulfil contractual and fiscal o= bligations and also to send you information regarding our services and even= ts. Your personal data are processed with and without electronic means and = by respecting data subjects' rights, fundamental freedoms and dignity, part= icularly with regard to confidentiality, personal identity and the right to= personal data protection. At any time and without formalities you can writ= e an e-mail to privacy@tis.bz.it in order to object the processing of your = personal data for the purpose of sending advertising materials and also to = exercise the right to access personal data and other rights referred to in = Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation = Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete inform= ation on the web site www.tis.bz.it. >>> >>> > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax =A0+39 0471 068 129 > claudio.martella@tis.bz.it http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13= of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that = we process your personal data in order to fulfil contractual and fiscal obl= igations and also to send you information regarding our services and events= . Your personal data are processed with and without electronic means and by= respecting data subjects' rights, fundamental freedoms and dignity, partic= ularly with regard to confidentiality, personal identity and the right to p= ersonal data protection. At any time and without formalities you can write = an e-mail to privacy@tis.bz.it in order to object the processing of your pe= rsonal data for the purpose of sending advertising materials and also to ex= ercise the right to access personal data and other rights referred to in Se= ction 7 of Decree 196/2003. The data controller is TIS Techno Innovation Al= to Adige, Siemens Street n. 19, Bolzano. You can find the complete informat= ion on the web site www.tis.bz.it. > > >