couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Antivackis <patrick.antivac...@gmail.com>
Subject Re: View keys case-insensitive?
Date Thu, 09 Apr 2009 23:18:30 GMT
Paul,

2009/4/10 Paul Davis <paul.joseph.davis@gmail.com>

> I've tried various combinations of UC_CASE_LEVEL, UC_CASE_FIRST, and
> UC_WEIGHT.
>

This is really not enough. Doing this you only try to say to the collation
that  a<<<A or A<<<a (third element)
but still it's an a upper, lower, witha tilde, with an accent or wathever.
All are just variation of A but still A.

If you look at :
http://www.unicode.org/Public/UCA/latest/allkeys.txt

and search for :

0061  ; [.1141.0020.0002.0061] # LATIN SMALL LETTER A


you will see a lot of A definition, but all have the same first element
:1141, they are all the same letter, other are just variation. So compared
to each of them they have an order but compared with an other letter they
all behave the same like an A

So, now if you want to change order of primary element, you need to use
custom tailoring :
http://userguide.icu-project.org/collation/customization

And you need to say thing like :
a < A (primary order)
So to simulate ASCII behaviour you should try something like :  a < b < c <d
< ......<A <B <C ....., so almost retype the ASCII table.

To be honest, i not tried, but that should work






> Also, I still don't see anything in this damned collation algorithm
> that explains how A < aa.  And this doesn't fall into the big/biggest
> comparison. The similar case would be Big < biggest. But I don't see
> anything in the damn collation algorithm document that talks about
> ignoring anything after primary weight in the case that one string is
> a prefix. In the various examples that I see I can't find anything
> that would contradict that expectation.
>
> Paul Davis
>
> For reference, the algorithm reference I'm using is this one:
> http://unicode.org/reports/tr10/
>
> I feel like printing the entire thing just so I can have a book burning.
>
> On Thu, Apr 9, 2009 at 6:39 PM, Patrick Antivackis
> <patrick.antivackis@gmail.com> wrote:
> > By the way, what customization did you try to send to ICU ?
> >
> > 2009/4/10 Paul Davis <paul.joseph.davis@gmail.com>
> >
> >> Patrick,
> >>
> >> I'm not asking for this relationship:
> >>
> >> a < b < A < B
> >>
> >> Merely:
> >>
> >> a < aa < A
> >>
> >> The thing is that even when I try and specify explicitly that 'A;
> >> should come after 'a' I still can't get the expected "a < aa < A"
> >> behavior. In a nutshell, "Why the hell does the second 'a' alter the
> >> comparison?"
> >>
> >> HTH,
> >> Paul Davis
> >>
> >> On Thu, Apr 9, 2009 at 5:45 PM, Patrick Antivackis
> >> <patrick.antivackis@gmail.com> wrote:
> >> > It's quite normal as far as ICU is concerned.
> >> > ICU is about language not about ASCII code.
> >> > In ICU, case is the third element looked for comparison (same level
> than
> >> > circled letter in Nordic languages for example), so not very
> important.
> >> > So when you sort words together, a or A is still an a, so they are
> sorted
> >> > nearby. In ICU you can specify if you prefer a before A or A before a,
> >> but
> >> > not simply a before b before c.... before A before B before C.
> >> >
> >> > To have such behavior (like ASCII) you need to custom ICU in
> specifying
> >> the
> >> > collation you want almost letter by letter.
> >> > It is great for you, but what about Japanese users or Arabic users ??
> >> >
> >> > So this is definitely the right behaviour of ICU sorting (collation).
> >> >
> >> >
> >> > 2009/4/9 Brian Candler <B.Candler@pobox.com>
> >> >
> >> >> > I've spent entirely too long on this now and I still can't for
the
> >> >> > life of me figure out why A < aa.
> >> >>
> >> >> Time for an experimental, black-box approach:
> >> >>
> >> >> ----
> >> >> require 'rubygems'
> >> >> require 'restclient'
> >> >> require 'json'
> >> >>
> >> >> DB="http://127.0.0.1:5984/collator"
> >> >>
> >> >> RestClient.delete DB rescue nil
> >> >> RestClient.put "#{DB}",""
> >> >>
> >> >> (32..126).each do |c|
> >> >>  RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json
> >> >> end
> >> >>
> >> >> RestClient.put "#{DB}/_design/test", <<EOS
> >> >> {
> >> >>  "views":{
> >> >>    "one":{
> >> >>      "map":"function (doc) { emit(doc.x,null); }"
> >> >>    }
> >> >>  }
> >> >> }
> >> >> EOS
> >> >>
> >> >> puts RestClient.get("#{DB}/_design/test/_view/one")
> >> >> ----
> >> >>
> >> >> This shows the collation sequence to be as follows.
> >> >>
> >> >> {"total_rows":95,"offset":0,"rows":[
> >> >> {"id":"20","key":" ","value":null},
> >> >> {"id":"60","key":"`","value":null},
> >> >> {"id":"5e","key":"^","value":null},
> >> >> {"id":"5f","key":"_","value":null},
> >> >> {"id":"2d","key":"-","value":null},
> >> >> {"id":"2c","key":",","value":null},
> >> >> {"id":"3b","key":";","value":null},
> >> >> {"id":"3a","key":":","value":null},
> >> >> {"id":"21","key":"!","value":null},
> >> >> {"id":"3f","key":"?","value":null},
> >> >> {"id":"2e","key":".","value":null},
> >> >> {"id":"27","key":"'","value":null},
> >> >> {"id":"22","key":"\"","value":null},
> >> >> {"id":"28","key":"(","value":null},
> >> >> {"id":"29","key":")","value":null},
> >> >> {"id":"5b","key":"[","value":null},
> >> >> {"id":"5d","key":"]","value":null},
> >> >> {"id":"7b","key":"{","value":null},
> >> >> {"id":"7d","key":"}","value":null},
> >> >> {"id":"40","key":"@","value":null},
> >> >> {"id":"2a","key":"*","value":null},
> >> >> {"id":"2f","key":"/","value":null},
> >> >> {"id":"5c","key":"\\","value":null},
> >> >> {"id":"26","key":"&","value":null},
> >> >> {"id":"23","key":"#","value":null},
> >> >> {"id":"25","key":"%","value":null},
> >> >> {"id":"2b","key":"+","value":null},
> >> >> {"id":"3c","key":"<","value":null},
> >> >> {"id":"3d","key":"=","value":null},
> >> >> {"id":"3e","key":">","value":null},
> >> >> {"id":"7c","key":"|","value":null},
> >> >> {"id":"7e","key":"~","value":null},
> >> >> {"id":"24","key":"$","value":null},
> >> >> {"id":"30","key":"0","value":null},
> >> >> {"id":"31","key":"1","value":null},
> >> >> {"id":"32","key":"2","value":null},
> >> >> {"id":"33","key":"3","value":null},
> >> >> {"id":"34","key":"4","value":null},
> >> >> {"id":"35","key":"5","value":null},
> >> >> {"id":"36","key":"6","value":null},
> >> >> {"id":"37","key":"7","value":null},
> >> >> {"id":"38","key":"8","value":null},
> >> >> {"id":"39","key":"9","value":null},
> >> >> {"id":"61","key":"a","value":null},
> >> >> {"id":"41","key":"A","value":null},
> >> >> {"id":"62","key":"b","value":null},
> >> >> {"id":"42","key":"B","value":null},
> >> >> {"id":"63","key":"c","value":null},
> >> >> {"id":"43","key":"C","value":null},
> >> >> {"id":"64","key":"d","value":null},
> >> >> {"id":"44","key":"D","value":null},
> >> >> {"id":"65","key":"e","value":null},
> >> >> {"id":"45","key":"E","value":null},
> >> >> {"id":"66","key":"f","value":null},
> >> >> {"id":"46","key":"F","value":null},
> >> >> {"id":"67","key":"g","value":null},
> >> >> {"id":"47","key":"G","value":null},
> >> >> {"id":"68","key":"h","value":null},
> >> >> {"id":"48","key":"H","value":null},
> >> >> {"id":"69","key":"i","value":null},
> >> >> {"id":"49","key":"I","value":null},
> >> >> {"id":"6a","key":"j","value":null},
> >> >> {"id":"4a","key":"J","value":null},
> >> >> {"id":"6b","key":"k","value":null},
> >> >> {"id":"4b","key":"K","value":null},
> >> >> {"id":"6c","key":"l","value":null},
> >> >> {"id":"4c","key":"L","value":null},
> >> >> {"id":"6d","key":"m","value":null},
> >> >> {"id":"4d","key":"M","value":null},
> >> >> {"id":"6e","key":"n","value":null},
> >> >> {"id":"4e","key":"N","value":null},
> >> >> {"id":"6f","key":"o","value":null},
> >> >> {"id":"4f","key":"O","value":null},
> >> >> {"id":"70","key":"p","value":null},
> >> >> {"id":"50","key":"P","value":null},
> >> >> {"id":"71","key":"q","value":null},
> >> >> {"id":"51","key":"Q","value":null},
> >> >> {"id":"72","key":"r","value":null},
> >> >> {"id":"52","key":"R","value":null},
> >> >> {"id":"73","key":"s","value":null},
> >> >> {"id":"53","key":"S","value":null},
> >> >> {"id":"74","key":"t","value":null},
> >> >> {"id":"54","key":"T","value":null},
> >> >> {"id":"75","key":"u","value":null},
> >> >> {"id":"55","key":"U","value":null},
> >> >> {"id":"76","key":"v","value":null},
> >> >> {"id":"56","key":"V","value":null},
> >> >> {"id":"77","key":"w","value":null},
> >> >> {"id":"57","key":"W","value":null},
> >> >> {"id":"78","key":"x","value":null},
> >> >> {"id":"58","key":"X","value":null},
> >> >> {"id":"79","key":"y","value":null},
> >> >> {"id":"59","key":"Y","value":null},
> >> >> {"id":"7a","key":"z","value":null},
> >> >> {"id":"5a","key":"Z","value":null}
> >> >> ]}
> >> >>
> >> >> I've never seen this sequence before. It's not even EBCDIC :-)
> >> >>
> >> >> Adding aa into the pot gives:
> >> >>
> >> >> ...
> >> >> {"id":"61","key":"a","value":null},
> >> >> {"id":"41","key":"A","value":null},
> >> >> {"id":"X","key":"aa","value":null},
> >> >> ...
> >> >>
> >> >> As you say, that is most bizarre.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Brian.
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message