directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radovan Semancik <radovan.seman...@evolveum.com>
Subject Re: Binary values and humanRedable flag
Date Mon, 10 Aug 2015 15:25:50 GMT
On 08/10/2015 03:10 PM, Emmanuel Lécharny wrote:
> Le 10/08/15 13:33, Radovan Semancik a écrit :
>> On 08/10/2015 12:42 PM, Emmanuel Lécharny wrote:
>>> There is no flag that says an Attribute is H-R or not. The
>>> information is provided in RFC 22524.3.2
>>> <https://tools.ietf.org/html/rfc2252#section-4.3.2>
>> Hmm, I was code for parsing of "X-NOT-HUMAN-READABLE" so I thought
>> that it might be caused by this. Thanks for clarification. Anyway, the
>> strange thing is that the syntax 1.3.6.1.4.1.1466.115.121.1.28 appears
>> to be human readable.
> WHich it is not :
>
> version: 1
> dn: m-oid=1.3.6.1.4.1.1466.115.121.1.28,ou=syntaxes,cn=system,ou=schema
> objectclass: top
> objectclass: metaTop
> objectclass: metaSyntax
> m-oid: 1.3.6.1.4.1.1466.115.121.1.28
> m-description: JPEG
> m-obsolete: FALSE
> x-not-human-readable: TRUE
> entrycsn: 20100111202214.878000Z#000000#000#000000
> creatorsname: uid=admin,ou=system
> createtimestamp: 20100111145217Z

Depends on the server. OpenLDAP defines the syntax like this:

ldapSyntaxes: ( 1.3.6.1.4.1.1466.115.121.1.28 DESC 'JPEG' 
X-NOT-HUMAN-READABLE
   'TRUE' )

But OpenDJ like this:

ldapSyntaxes: ( 1.3.6.1.4.1.1466.115.121.1.28 DESC 'JPEG' )

This is probably the difference. (And thanks for pointing that out. I 
completely forgot that syntax declaration is also part of the schema.)

I believe that the API works with ApacheDS :-) ... but my goal is to 
make it work with other LDAP servers as well. And the detection of H/R 
is clearly wrong with OpenDJ. So I'm trying to figure out what's going 
on. Now it looks like that the OpenDJ declaration of the syntax is 
correct. I would expect that is no X-NOT-HUMAN-READABLE clause is 
present then the H/R flag will be set according to the RFC. But it is 
not. The API seems to be assuming "true" as a default for H/R flag. Is 
this a bug in the API?

One more datapoint. This is the same test program run on eDirectory. 
Same problem:

jpegPhoto AttributeType = attributetype ( 0.9.2342.19200300.100.1.60 
NAME 'jpegPhoto'
     SYNTAX 1.3.6.1.4.1.1466.115.121.1.40
     USAGE userApplications )
jpegPhoto syntax = ldapsyntax ( 1.3.6.1.4.1.1466.115.121.1.40
     X-NOT-HUMAN-READABLE 'false' )
jpegPhoto syntax H/R = true

eDirectory syntax definition:

ldapSyntaxes: ( 1.3.6.1.4.1.1466.115.121.1.28 X-NDS_SYNTAX '9' )

> X-NOT-HUMAN-READABLE 'false', which means it's hulan readable. But I
> guess OpenDJ does *not* set the X-NOT-HUMAN-READABLE flag, while
> openLDAP does.

Yes, that really seems to be the case.

> I expect the server or the client to *know* magically that this
> attribute is H/R when connected to OpenDJ, right ? (irony)

No magic needed here (although some magic might come very useful with 
some LDAP servers :-) ) .... I just expect that when no 
X-NOT-HUMAN-READABLE is present then the default from the RFC is used. 
Isn't that a reasonable expectation?

> Yes, that's true. the rational is that we do a best effort to inject
> values correctly, converting them on the fly.
>
> Note that this H-R flag itself is stupid. It was added 12 years ago as a
> way to follow teh RFC, but as a matter of fact, the Syntax itself
> already drives the type of data we can store in an Attribute. I made it
> even more complex by trying to use Generics. Now, we have those
> StringValue and BinaryValue all over the code.
>
> Ideally, we should not have to care about what we store, and always
> consider the stored values as byte[]. OTOH, it's not convenient when we
> want to manipulate values as String, as converting them over and over
> from byte[] to Strings is costly (epecially in the server). But I do
> think we went way to far here. This conversion should be done internally
> once, and that's it. It would save us a hell lot of time, and would make
> the APi more comfortable to use.
>> I tend to agree. Always storing the value as binary seems to be good
>> idea.
> Depends. from the performance POV, this is killing the server. Most of
> the AT are H/R, and require some checks (comparison, normalization, etc)
> during the processing of every request. Having only the binary value is
> forcing the server to do the conversion back and forth multiple times.
> We faced this issue and when we switched to StringValue and BinaryValue,
> the performance boost was huge (100%).
>
> Ideally, we should have 2 methods :
> - getBinaryValue()
> - getStringValue()
>
> because we always know which type we are dealing with. But that's the
> point : in the server, for operatiuons involving many attributes, that
> would require a check on the Syntax everytime we want to manipulate a
> value, which is a bit of a PITA, especially when we don't care about
> this type. Having a Value<?> wrapper helps a lot here...

I understand. And storing converted string values is not really a 
problem. As long as the binary value is the primary one. Current 
StringValue implementation has it the other way around. And this causes 
problems. E.g. I have binary value of 2e254d883270c44cd7ae2e254d883270. 
The '88' and 'C4' are not a valid UTF codes, so if they are converted to 
string, it will have those strange inverted question mark characters. 
And when converted back to binary it becomes 
2e254defbfbd3270efbfbd4cd7ae2e254defbfbd3270 ... so both the '88' and 
'C4' are translated to 'efbfbd' and the data are ruined.

If the StringValue was implemeted the other way around then it may be 
less harmful. I.e. storing the binary value as a primary and converting 
that to string. Storing that string in the StringValue object is OK (as 
far as it is properly invalidated when the bytes change, but that should 
not be a problem). As far as I understand the StringValue is storing 
both values even now. So this is only matter of changing the 
implementation and always storing the binary value as primary - both in 
BinaryValue and Stringvalue.

> I'm really willing to find a better solution, I have worked a full
> quarter on this issue (bin/string values) and I haven't be able to come
> with something that hide the inconsitency and complexity of LDAP in this
> area, sadly... May be it's time for a rehearsal...

I think that the code is not that bad to require a complete redesign. 
The interfaces should to be OK as far as I can tell now. So maybe only 
some internal refactoring is needed. That can be done in an evolutionary 
fashion. What about just starting with storing the binary value as a 
primary one? Then even if there is a problem with correct detection of 
attribute type then no data is really lost and the client can still 
safely use value.getBytes() regardless of whether it is BinaryValue or 
StringValue.

BTW, now I have been able to work around the jpegPhoto problem by not 
setting the attributeType into the Modification. And I have been able to 
work around wrong detection of GUID and ruined data by using custom 
BinaryAttributeDetector. So I'm OK now. But anyway, I believe that the 
root causes of these issues should be fixed in the API.

-- 
Radovan Semancik
Software Architect
evolveum.com


Mime
View raw message