directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <aok...@bellsouth.net>
Subject Re: [filter] interpretting presence verses substring with whitespace
Date Sat, 13 Nov 2004 19:04:16 GMT
Gentleman,

Please excuse my absence after posting this.  At the moment I'm at 
ApacheCon and working on things there so my responses will lag 
somewhat.  Rest assured I am following the conversation.  Thanks so much 
for your input on this matter.

Regards,
Alex Karasulu

Steven Legg wrote:

>
> Kurt,
>
> Kurt D. Zeilenga wrote:
>
>> At 08:59 PM 11/8/2004, Steven Legg wrote:
>>
>>
>>> Hi Alex,
>>>
>>> Alex Karasulu wrote:
>>>
>>>> Hello,
>>>> I have some questions regarding the interpretation of LDAP search 
>>>> filters specifically differentiating between presence and substring 
>>>> items when whitespace is present.  According to the ABNF describing 
>>>> these rules in [FILTERS], and some additional rules in [MODELS] ,
>>>>    ...
>>>>    present        = attr EQUALS ASTERISK
>>>>    substring      = attr EQUALS [initial] any [final]
>>>>    initial        = assertionvalue
>>>>    any            = ASTERISK *(assertionvalue ASTERISK)
>>>>    final          = assertionvalue
>>>>    attr           = attributedescription
>>>>    ...,
>>>> the presence of whitespace is considered significant in the 
>>>> assertionvalue.
>>>
>>
>>
>> This wording is, I think, causing your problem.
>>
>> Any and all whitespace is part of some assertionvalue.
>> Whether or not its significant to the evaluation of the
>> filter depends on the rule involved.
>>
>>
>>>> Please correct me if I'm wrong but this means that the following 
>>>> filter expressions are interpreted differently:
>>>> (for simplicity I'm equating whitespace to be a single space 
>>>> character, %x20)
>>>> 1. (ou=*)
>>>>  - there is no whitespace at all
>>>>  - interpreted as a presence filter
>>>>  - matches all entries containing the ou attribute
>>>> 2. (ou= *)
>>>>  - there is whitespace before the ASTERISK after the EQUALS
>>>>  - interpreted as a substring filter
>>>>  - the space is interpreted as the [initial]
>>>>  - matches all values of ou starting with a space, %x20
>>>
>>>
>>> The exact matching behaviour depends on the attribute type. 
>>> Typically though,
>>> it will be equivalent to caseIgnoreSubstringsMatch. Assuming that is 
>>> the case
>>> then the current ldapbis specifications would invoke stringprep on each
>>> candidate attribute value and each substring of the assertion.
>>
>>
>>
>> I argue that the behavior described by X.521 is the same
>> as prescribed by [Syntaxes][LDAPprep].
>>
>>
>>> The result will
>>> be that no attribute value will have (for matching purposes) a 
>>> leading space.
>>> The initial substring will get reduced to empty which then becomes a
>>> single space. After that it is a code point comparison. Since no 
>>> attribute
>>> value has a leading space, none are matched, and the result is empty.
>>
>>
>>
>> IMO, that's what X.521 says should happen.
>>
>>
>>> This isn't the intuitive result either. The same occurs in the other 
>>> examples for much the same reasons.
>>
>>
>>
>> Yes.
>>
>>
>>> Treating the whitespace as insignificant (unless escaped) in the string
>>> representation of the filter partly helps as it makes all your examples
>>> equivalent to a present match,
>>
>>
>>
>> I don't understand this statement.
>>         (ou= *) and (ou=\20*) are two encodings of the same filter
>>         (substrings assertion for the initial string " "),
>>         neither of which is equivalent to a present match.
>
>
> Alex was postulating an alternative solution where unescaped 
> whitespace in the string
> representation of the filter is insignificant and would be stripped in 
> converting
> the string representation into an LDAP search filter in protocol. If 
> that were so
> then (ou= *) and (ou=*) would be carried in LDAP as presence matches, and
> (ou=\20*) would be carried as a substrings match with initial 
> substring " ".
>
>>
>>
>>> but there would still be a problem with
>>> cases where the whitespace is explicitly escaped. Stringprep will still
>>> cause (ou=\20*) to match nothing.
>>>
>>> It seams to me that stringprep should allow a result string to be 
>>> empty,
>>> rather than replacing it by a single space. If that were the case 
>>> then an
>>> initial substring of " " would be reduced to an empty string, which 
>>> would
>>> trivially match every value, giving the same effect as a presence 
>>> match.
>>
>>
>>
>> That's not consistent with the behavior described in X.521.
>
>
> It may be consistent with the behaviour described in RFC 2252, which 
> omitted
> to say that a string of all spaces is replaced with a single space.
>
> I personally think that replacement step is unwise. It leads to odd 
> results
> like the following: an initial substring of " " matches a value of " 
> ", but
> doesn't match a value of " foo" even though " " is clearly a prefix of 
> " foo".
>
>>
>>
>>> Similarly, an any substring that reduces to an empty string is 
>>> trivially
>>> satisfied and so is effectively ignored. In fact, this change to 
>>> stringprep
>>> would make escaping of whitespace in the string representation of 
>>> filters
>>> largely moot.
>>
>>
>>
>> I don't understand your last point here.
>
>
> This was again in reference to Alex's alternative solution. If 
> stringprep/LDAPprep
> didn't replace an empty string with a single space then the effects 
> would be
> much the same as Alex's solution with respect to string filters.
>
> Regards,
> Steven
>
> > Irregardless of
>
>> how the matching is performed, escaping whitepace in the
>> string representation of the filter produces the same filter
>> wire-encoding as if the whitescape were not escaped in the
>> string representation, and hence matches in the exact same
>> manner.  That is, the escaping has only been moot.  Introduction
>> of LDAPprep doesn't change that.
>>
>>
>>
>>> Regards,
>>> Steven
>>>
>>>
>>>> 3. (ou=* )
>>>>  - there is whitespace after the ASTERISK before the RPAREN
>>>>  - interpreted as a substring filter
>>>>  - the space is interpreted as the [final]
>>>>  - matches all values of ou ending with a space, %x20
>>>> 4. (ou= * )
>>>>  - there is whitespace before the ASTERISK and after the ASTERISK
>>>>  - interpreted as a substring filter
>>>>  - the first and last spaces are interpreted as the [initial] and 
>>>> [final] values respectively
>>>>  - matches all values of ou starting and ending with a space, %x20
>>>> 5. there's another class where two or more ASTERISKS sandwich 
>>>> whitespace: (ou=* *)
>>>>  - although other forms would be a bit nonsensical this one may be 
>>>> valid and would match      all entires with ou values starting or 
>>>> ending with a space, %x20
>>>> Are these correct interpretations according to the ABNF and is the 
>>>> matching behavior correct?
>>>> Now I'd like to open for discussion whether or not these 
>>>> interpretations are intuitively correct.  As an end user issuing 
>>>> search filters to a directory I've come to expect the directory to 
>>>> be extra forgiving when it comes to things like whitespace.  Users 
>>>> have gotten this feeling regarding whitespace forgiveness from the 
>>>> way distinguished names are normalized by the directory.  It's 
>>>> intuitive for the user to presume some of this forgiving nature 
>>>> extends to filters which can match on attributes with the DN 
>>>> syntax.  So looking at the examples above I can see how a user may 
>>>> think that all these filters are in fact equal to one another.  The 
>>>> user is not thinking, "=* is a distinct atomic operator token to a 
>>>> parser and is inseparable where a space makes it no longer a 
>>>> presence ffilter."  The user thinks well I'm matching for 
>>>> anything.    What if they just like to put spaces around 
>>>> parentheses in their filter expressions?  This space forgiving 
>>>> nature is "turned on" for matching normal equality expressions on 
>>>> attributes like ou and is especially forgiving if 
>>>> distinguishedNameMatch is in effect for respective attributes.
>>>> So would you agree that there is some mismatch between the hard 
>>>> ABNF interpretation and the mental interpolation of users writing 
>>>> filters?  IMO I think all whitespace should be escaped if 
>>>> significant.  
>>>
>>
>>
>> See above.  Escaping whitespace in the string representation has
>> zero impact upon the wire encoding of the filter nor its evalutation.
>>
>>>> Otherwise whitespace should be trimmed from the edges of 
>>>> attributevalues.  Also whitespace within the interior of the value 
>>>> should be reduced to a single space to preserve tokenization order 
>>>> while matching.  With regard to substring items the 'any' pieces 
>>>> between two ASTERISKS  that are purely composed of whitespace 
>>>> should be discarded and the ASTERISKS consolidated into one.
>>>> This makes life tougher on those that really want to match based on 
>>>> whitespace.  However they can just escape out the whitespace in 
>>>> their filters like so:
>>>> 1. (ou=*)
>>>> 2. (ou=\20*)
>>>> 3. (ou=*\20)
>>>> 4. (ou=\20*\20)
>>>> 5. (ou=*\20*)
>>>> Comments? Thoughts?
>>>> Thanks,
>>>> Alex
>>>> [Filters]     Smith, M. (editor), LDAPbis WG, "LDAP: String
>>>>              Representation of Search Filters",
>>>>              draft-ietf-ldapbis-filter-xx.txt, a work in progress.
>>>> [Models]      Zeilenga, K. (editor), "LDAP: Directory Information 
>>>> Models",
>>>>              draft-ietf-ldapbis-models-xx.txt, a work in progress.
>>>
>>
>>
>>
>


Mime
View raw message