couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: View keys case-insensitive?
Date Thu, 09 Apr 2009 21:39:50 GMT
On Thu, Apr 9, 2009 at 5:08 PM, Brian Candler <B.Candler@pobox.com> wrote:
>> I've spent entirely too long on this now and I still can't for the
>> life of me figure out why A < aa.
>
> Time for an experimental, black-box approach:
>
> ----
> require 'rubygems'
> require 'restclient'
> require 'json'
>
> DB="http://127.0.0.1:5984/collator"
>
> RestClient.delete DB rescue nil
> RestClient.put "#{DB}",""
>
> (32..126).each do |c|
>  RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json
> end
>
> RestClient.put "#{DB}/_design/test", <<EOS
> {
>  "views":{
>    "one":{
>      "map":"function (doc) { emit(doc.x,null); }"
>    }
>  }
> }
> EOS
>
> puts RestClient.get("#{DB}/_design/test/_view/one")
> ----
>
> This shows the collation sequence to be as follows.
>
> {"total_rows":95,"offset":0,"rows":[
> {"id":"20","key":" ","value":null},
> {"id":"60","key":"`","value":null},
> {"id":"5e","key":"^","value":null},
> {"id":"5f","key":"_","value":null},
> {"id":"2d","key":"-","value":null},
> {"id":"2c","key":",","value":null},
> {"id":"3b","key":";","value":null},
> {"id":"3a","key":":","value":null},
> {"id":"21","key":"!","value":null},
> {"id":"3f","key":"?","value":null},
> {"id":"2e","key":".","value":null},
> {"id":"27","key":"'","value":null},
> {"id":"22","key":"\"","value":null},
> {"id":"28","key":"(","value":null},
> {"id":"29","key":")","value":null},
> {"id":"5b","key":"[","value":null},
> {"id":"5d","key":"]","value":null},
> {"id":"7b","key":"{","value":null},
> {"id":"7d","key":"}","value":null},
> {"id":"40","key":"@","value":null},
> {"id":"2a","key":"*","value":null},
> {"id":"2f","key":"/","value":null},
> {"id":"5c","key":"\\","value":null},
> {"id":"26","key":"&","value":null},
> {"id":"23","key":"#","value":null},
> {"id":"25","key":"%","value":null},
> {"id":"2b","key":"+","value":null},
> {"id":"3c","key":"<","value":null},
> {"id":"3d","key":"=","value":null},
> {"id":"3e","key":">","value":null},
> {"id":"7c","key":"|","value":null},
> {"id":"7e","key":"~","value":null},
> {"id":"24","key":"$","value":null},
> {"id":"30","key":"0","value":null},
> {"id":"31","key":"1","value":null},
> {"id":"32","key":"2","value":null},
> {"id":"33","key":"3","value":null},
> {"id":"34","key":"4","value":null},
> {"id":"35","key":"5","value":null},
> {"id":"36","key":"6","value":null},
> {"id":"37","key":"7","value":null},
> {"id":"38","key":"8","value":null},
> {"id":"39","key":"9","value":null},
> {"id":"61","key":"a","value":null},
> {"id":"41","key":"A","value":null},
> {"id":"62","key":"b","value":null},
> {"id":"42","key":"B","value":null},
> {"id":"63","key":"c","value":null},
> {"id":"43","key":"C","value":null},
> {"id":"64","key":"d","value":null},
> {"id":"44","key":"D","value":null},
> {"id":"65","key":"e","value":null},
> {"id":"45","key":"E","value":null},
> {"id":"66","key":"f","value":null},
> {"id":"46","key":"F","value":null},
> {"id":"67","key":"g","value":null},
> {"id":"47","key":"G","value":null},
> {"id":"68","key":"h","value":null},
> {"id":"48","key":"H","value":null},
> {"id":"69","key":"i","value":null},
> {"id":"49","key":"I","value":null},
> {"id":"6a","key":"j","value":null},
> {"id":"4a","key":"J","value":null},
> {"id":"6b","key":"k","value":null},
> {"id":"4b","key":"K","value":null},
> {"id":"6c","key":"l","value":null},
> {"id":"4c","key":"L","value":null},
> {"id":"6d","key":"m","value":null},
> {"id":"4d","key":"M","value":null},
> {"id":"6e","key":"n","value":null},
> {"id":"4e","key":"N","value":null},
> {"id":"6f","key":"o","value":null},
> {"id":"4f","key":"O","value":null},
> {"id":"70","key":"p","value":null},
> {"id":"50","key":"P","value":null},
> {"id":"71","key":"q","value":null},
> {"id":"51","key":"Q","value":null},
> {"id":"72","key":"r","value":null},
> {"id":"52","key":"R","value":null},
> {"id":"73","key":"s","value":null},
> {"id":"53","key":"S","value":null},
> {"id":"74","key":"t","value":null},
> {"id":"54","key":"T","value":null},
> {"id":"75","key":"u","value":null},
> {"id":"55","key":"U","value":null},
> {"id":"76","key":"v","value":null},
> {"id":"56","key":"V","value":null},
> {"id":"77","key":"w","value":null},
> {"id":"57","key":"W","value":null},
> {"id":"78","key":"x","value":null},
> {"id":"58","key":"X","value":null},
> {"id":"79","key":"y","value":null},
> {"id":"59","key":"Y","value":null},
> {"id":"7a","key":"z","value":null},
> {"id":"5a","key":"Z","value":null}
> ]}
>
> I've never seen this sequence before. It's not even EBCDIC :-)
>

During my reading earlier I'm pretty sure that you have to set a
couple specific options to get EBCDIC behavior which would explain
that.

> Adding aa into the pot gives:
>
> ...
> {"id":"61","key":"a","value":null},
> {"id":"41","key":"A","value":null},
> {"id":"X","key":"aa","value":null},
> ...

This is the only thing that I see as wrong. According to what I read
to the ICU location specification is is all correct. If I guess right,
å should become a and A, and Å should come after A and I hope for
consistency's sake it comes before aa. Through my reading though I
never managed to quite pin down how A and aa would be expected to
sort.

All the algorithm basically does is setup an array of numbers that you
could do a normal strcmp on to get the answer.

The strengths were listed as:

Primary: basically means the letter. a, A, å, Å all have the same
primary strength
Secondary: takes care of accents. a < å
Ternary: takes care of case: a < A and I think the extra bits like
when the letter is in a circle.
Quaternary: takes care of some Japanese and other foreign things
Identical: tie breaker that just differentiates based on code point.

Each level gets a weight, and then you just iterate the list of
weights to calculate the collation. ICU has an api for what it calls
sort keys, but I either I was printing them wrong or they can change
for the same string between successive calls so they didn't illuminate
anything quickly enough for me to figure this out.

HTH,
Paul Davis

>
> As you say, that is most bizarre.
>
> Cheers,
>
> Brian.
>

Mime
View raw message