Return-Path: X-Original-To: apmail-directory-dev-archive@www.apache.org Delivered-To: apmail-directory-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 20D8818422 for ; Wed, 14 Oct 2015 19:27:16 +0000 (UTC) Received: (qmail 39828 invoked by uid 500); 14 Oct 2015 19:27:11 -0000 Delivered-To: apmail-directory-dev-archive@directory.apache.org Received: (qmail 39775 invoked by uid 500); 14 Oct 2015 19:27:11 -0000 Mailing-List: contact dev-help@directory.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Apache Directory Developers List" Delivered-To: mailing list dev@directory.apache.org Received: (qmail 39765 invoked by uid 99); 14 Oct 2015 19:27:10 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Oct 2015 19:27:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4BD22C4399 for ; Wed, 14 Oct 2015 19:27:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mb5PiUlD1Oit for ; Wed, 14 Oct 2015 19:27:02 +0000 (UTC) Received: from hermes.evolveum.com (hermes.evolveum.com [46.29.2.130]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 4CB06439E8 for ; Wed, 14 Oct 2015 19:27:02 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hermes.evolveum.com (Postfix) with ESMTP id A799D369CE6 for ; Wed, 14 Oct 2015 21:30:25 +0200 (CEST) Received: from hermes.evolveum.com ([127.0.0.1]) by localhost (hermes.evolveum.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id IpHBvGNMjRUj for ; Wed, 14 Oct 2015 21:30:25 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by hermes.evolveum.com (Postfix) with ESMTP id 0A724369CF4 for ; Wed, 14 Oct 2015 21:30:25 +0200 (CEST) X-Virus-Scanned: amavisd-new at hermes.evolveum.com Received: from hermes.evolveum.com ([127.0.0.1]) by localhost (hermes.evolveum.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id QZZaVyL75eVh for ; Wed, 14 Oct 2015 21:30:24 +0200 (CEST) Received: from [10.1.1.50] (static-dsl-137.87-197-146.telecom.sk [87.197.146.137]) by hermes.evolveum.com (Postfix) with ESMTPSA id E4A78369CE6 for ; Wed, 14 Oct 2015 21:30:24 +0200 (CEST) Subject: Re: Value handling ideas To: dev@directory.apache.org References: <561C010F.4090401@gmail.com> From: Radovan Semancik Message-ID: <561EAC7D.9070404@evolveum.com> Date: Wed, 14 Oct 2015 21:26:53 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <561C010F.4090401@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi, Sounds good. Just note that "user provided" value is in fact the one=20 that is decoded from the protocol in case that a value is received in=20 LDAP message. And in that case the "user provided" value is always=20 binary, even for string values. If you remember our conversation some=20 time ago the re-coding of string value by current implementation caused=20 quite nasty problem if the receiver cannot reliably distinguish string=20 and binary values. Therefore I really like the idea to store the=20 original value and record on demand. If that is done well then the=20 originally-received binary values can be retrieved even from StringValue=20 class. And that will improve usability of the API. --=20 Radovan Semancik Software Architect evolveum.com On 10/12/2015 08:50 PM, Emmanuel L=C3=A9charny wrote: > Thoughts about value handling in the API and Server > --------------------------------------------------- > > We currently manage a quite complex hierarchy of classes to handle > attribute's values : > > (Value) > o > | > +--[[AbstractValue]] > ^ > | > +-- [StringValue | T : byte[]] > | > +-- [BinaryValue | T : String] > =20 > Every Value holds a wrappedValue (aka User Provided value) and a > normalizedValue. This second aspect is absolutely mandatory, because we > always return the UPValue back to the user, and we always compare value= s > using the normalized value (well, we can discuss that too). > > DN and Filters are using a String representation of values that are a > bit specific. Typically, some chars get escaped in both cases (but not > the same way). > > That is quite complex... > > We probably can handle those values in a different way. First of all, > binary values aren't modified by the normalization process, so we could > most certainly save some space by not keeping a UpValue within a > NormValue for such values. Second, everything in LDAP is using UTF-8, > and we can easily convert UTF-8 to Unicode (which is the default format > for char in Java). We so have a trivial UTF-8 <--> Unicode conversion > that could be used if needed. > > Last, not least every value is written either as a byte[] (binary > values) or as a UTF-8 String, which is also a byte[]. Knowing that we > will send back the values to the client converting them from String to > UTF-8, we can assume that most of the case, we are doing two conversion= s > (from byte[] to UTF-8 to String and then from String to UTF-8 to > byte[]), mostly wasting a lot of CPU... > > Another idea would be to simply hide the byte[] unless we need to > convert them to a String, which can be done when needed. We need to > convert the values when we do a normalize (this happens when we want to > compare the value to another one), or a compare. We also need to run > every value through the PrepareString methods (and PrepSASL for the > userPassword) before saving them to the disk. > > > At this point, I can forsess some huge simplification in both the API > and the serverbu switching to a simpler data structure, and a potential > speedup (avoiding useless conversion). > > I'd like you to review what I just wrote and tell me if I'm off base, o= r > if you feel like me that we can get a better server by changing those > data strcture. > > Thanks !