directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <>
Subject Re: [filter] interpretting presence verses substring with whitespace
Date Tue, 09 Nov 2004 16:34:35 GMT

Funny I was following all the discussions around stringprep and it 
occurred to me that it might have something to do with this matter.  
Seems like I have some homework to do.  I'm going to reread your reponse 
here and those trails a few more times. 

Much appreciated,
Alex Karasulu

Steven Legg wrote:

> Hi Alex,
> Alex Karasulu wrote:
>> Hello,
>> I have some questions regarding the interpretation of LDAP search 
>> filters specifically differentiating between presence and substring 
>> items when whitespace is present.  According to the ABNF describing 
>> these rules in [FILTERS], and some additional rules in [MODELS] ,
>>      ...
>>      present        = attr EQUALS ASTERISK
>>      substring      = attr EQUALS [initial] any [final]
>>      initial        = assertionvalue
>>      any            = ASTERISK *(assertionvalue ASTERISK)
>>      final          = assertionvalue
>>      attr           = attributedescription
>>      ...,
>> the presence of whitespace is considered significant in the 
>> assertionvalue.  Please correct me if I'm wrong but this means that 
>> the following filter expressions are interpreted differently:
>> (for simplicity I'm equating whitespace to be a single space 
>> character, %x20)
>> 1. (ou=*)
>>    - there is no whitespace at all
>>    - interpreted as a presence filter
>>    - matches all entries containing the ou attribute
>> 2. (ou= *)
>>    - there is whitespace before the ASTERISK after the EQUALS
>>    - interpreted as a substring filter
>>    - the space is interpreted as the [initial]
>>    - matches all values of ou starting with a space, %x20
> The exact matching behaviour depends on the attribute type. Typically 
> though,
> it will be equivalent to caseIgnoreSubstringsMatch. Assuming that is 
> the case
> then the current ldapbis specifications would invoke stringprep on each
> candidate attribute value and each substring of the assertion. The 
> result will
> be that no attribute value will have (for matching purposes) a leading 
> space.
> The initial substring will get reduced to empty which then becomes a
> single space. After that it is a code point comparison. Since no 
> attribute
> value has a leading space, none are matched, and the result is empty.
> This isn't the intuitive result either. The same occurs in the other 
> examples
> for much the same reasons.
> Treating the whitespace as insignificant (unless escaped) in the string
> representation of the filter partly helps as it makes all your examples
> equivalent to a present match, but there would still be a problem with
> cases where the whitespace is explicitly escaped. Stringprep will still
> cause (ou=\20*) to match nothing.
> It seams to me that stringprep should allow a result string to be empty,
> rather than replacing it by a single space. If that were the case then an
> initial substring of " " would be reduced to an empty string, which would
> trivially match every value, giving the same effect as a presence match.
> Similarly, an any substring that reduces to an empty string is trivially
> satisfied and so is effectively ignored. In fact, this change to 
> stringprep
> would make escaping of whitespace in the string representation of filters
> largely moot.
> Regards,
> Steven
>> 3. (ou=* )
>>    - there is whitespace after the ASTERISK before the RPAREN
>>    - interpreted as a substring filter
>>    - the space is interpreted as the [final]
>>    - matches all values of ou ending with a space, %x20
>> 4. (ou= * )
>>    - there is whitespace before the ASTERISK and after the ASTERISK
>>    - interpreted as a substring filter
>>    - the first and last spaces are interpreted as the [initial] and 
>> [final] values respectively
>>    - matches all values of ou starting and ending with a space, %x20
>> 5. there's another class where two or more ASTERISKS sandwich 
>> whitespace: (ou=* *)
>>    - although other forms would be a bit nonsensical this one may be 
>> valid and would match      all entires with ou values starting or 
>> ending with a space, %x20
>> Are these correct interpretations according to the ABNF and is the 
>> matching behavior correct?
>> Now I'd like to open for discussion whether or not these 
>> interpretations are intuitively correct.  As an end user issuing 
>> search filters to a directory I've come to expect the directory to be 
>> extra forgiving when it comes to things like whitespace.  Users have 
>> gotten this feeling regarding whitespace forgiveness from the way 
>> distinguished names are normalized by the directory.  It's intuitive 
>> for the user to presume some of this forgiving nature extends to 
>> filters which can match on attributes with the DN syntax.  So looking 
>> at the examples above I can see how a user may think that all these 
>> filters are in fact equal to one another.  The user is not thinking, 
>> "=* is a distinct atomic operator token to a parser and is 
>> inseparable where a space makes it no longer a presence ffilter."  
>> The user thinks well I'm matching for anything.    What if they just 
>> like to put spaces around parentheses in their filter expressions?  
>> This space forgiving nature is "turned on" for matching normal 
>> equality expressions on attributes like ou and is especially 
>> forgiving if distinguishedNameMatch is in effect for respective 
>> attributes.
>> So would you agree that there is some mismatch between the hard ABNF 
>> interpretation and the mental interpolation of users writing 
>> filters?  IMO I think all whitespace should be escaped if 
>> significant.  Otherwise whitespace should be trimmed from the edges 
>> of attributevalues.  Also whitespace within the interior of the value 
>> should be reduced to a single space to preserve tokenization order 
>> while matching.  With regard to substring items the 'any' pieces 
>> between two ASTERISKS  that are purely composed of whitespace should 
>> be discarded and the ASTERISKS consolidated into one.
>> This makes life tougher on those that really want to match based on 
>> whitespace.  However they can just escape out the whitespace in their 
>> filters like so:
>> 1. (ou=*)
>> 2. (ou=\20*)
>> 3. (ou=*\20)
>> 4. (ou=\20*\20)
>> 5. (ou=*\20*)
>> Comments? Thoughts?
>> Thanks,
>> Alex
>>  [Filters]     Smith, M. (editor), LDAPbis WG, "LDAP: String
>>                Representation of Search Filters",
>>                draft-ietf-ldapbis-filter-xx.txt, a work in progress.
>>  [Models]      Zeilenga, K. (editor), "LDAP: Directory Information 
>> Models",
>>                draft-ietf-ldapbis-models-xx.txt, a work in progress.

View raw message