directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <>
Subject [filter] interpretting presence verses substring with whitespace
Date Mon, 08 Nov 2004 17:07:35 GMT

I have some questions regarding the interpretation of LDAP search 
filters specifically differentiating between presence and substring 
items when whitespace is present.  According to the ABNF describing 
these rules in [FILTERS], and some additional rules in [MODELS] ,

      present        = attr EQUALS ASTERISK
      substring      = attr EQUALS [initial] any [final]
      initial        = assertionvalue
      any            = ASTERISK *(assertionvalue ASTERISK)
      final          = assertionvalue
      attr           = attributedescription

the presence of whitespace is considered significant in the 
assertionvalue.  Please correct me if I'm wrong but this means that the 
following filter expressions are interpreted differently:

(for simplicity I'm equating whitespace to be a single space character, 

1. (ou=*)
    - there is no whitespace at all
    - interpreted as a presence filter
    - matches all entries containing the ou attribute
2. (ou= *)
    - there is whitespace before the ASTERISK after the EQUALS
    - interpreted as a substring filter
    - the space is interpreted as the [initial]
    - matches all values of ou starting with a space, %x20
3. (ou=* )
    - there is whitespace after the ASTERISK before the RPAREN
    - interpreted as a substring filter
    - the space is interpreted as the [final]
    - matches all values of ou ending with a space, %x20
4. (ou= * )
    - there is whitespace before the ASTERISK and after the ASTERISK
    - interpreted as a substring filter
    - the first and last spaces are interpreted as the [initial] and 
[final] values respectively
    - matches all values of ou starting and ending with a space, %x20
5. there's another class where two or more ASTERISKS sandwich 
whitespace: (ou=* *)
    - although other forms would be a bit nonsensical this one may be 
valid and would match 
      all entires with ou values starting or ending with a space, %x20

Are these correct interpretations according to the ABNF and is the 
matching behavior correct?

Now I'd like to open for discussion whether or not these interpretations 
are intuitively correct.  As an end user issuing search filters to a 
directory I've come to expect the directory to be extra forgiving when 
it comes to things like whitespace.  Users have gotten this feeling 
regarding whitespace forgiveness from the way distinguished names are 
normalized by the directory.  It's intuitive for the user to presume 
some of this forgiving nature extends to filters which can match on 
attributes with the DN syntax.  So looking at the examples above I can 
see how a user may think that all these filters are in fact equal to one 
another.  The user is not thinking, "=* is a distinct atomic operator 
token to a parser and is inseparable where a space makes it no longer a 
presence ffilter."  The user thinks well I'm matching for anything.    
What if they just like to put spaces around parentheses in their filter 
expressions?  This space forgiving nature is "turned on" for matching 
normal equality expressions on attributes like ou and is especially 
forgiving if distinguishedNameMatch is in effect for respective attributes.

So would you agree that there is some mismatch between the hard ABNF 
interpretation and the mental interpolation of users writing filters?  
IMO I think all whitespace should be escaped if significant.  Otherwise 
whitespace should be trimmed from the edges of attributevalues.  Also 
whitespace within the interior of the value should be reduced to a 
single space to preserve tokenization order while matching.  With regard 
to substring items the 'any' pieces between two ASTERISKS  that are 
purely composed of whitespace should be discarded and the ASTERISKS 
consolidated into one.

This makes life tougher on those that really want to match based on 
whitespace.  However they can just escape out the whitespace in their 
filters like so:

1. (ou=*)
2. (ou=\20*)
3. (ou=*\20)
4. (ou=\20*\20)
5. (ou=*\20*)

Comments? Thoughts?


  [Filters]     Smith, M. (editor), LDAPbis WG, "LDAP: String
                Representation of Search Filters",
                draft-ietf-ldapbis-filter-xx.txt, a work in progress.

  [Models]      Zeilenga, K. (editor), "LDAP: Directory Information Models",
                draft-ietf-ldapbis-models-xx.txt, a work in progress.

View raw message