From dev-return-3884-apmail-couchdb-dev-archive=couchdb.apache.org@couchdb.apache.org Thu Apr 09 23:19:03 2009 Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 10800 invoked from network); 9 Apr 2009 23:19:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Apr 2009 23:19:02 -0000 Received: (qmail 62521 invoked by uid 500); 9 Apr 2009 23:19:02 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 62439 invoked by uid 500); 9 Apr 2009 23:19:02 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 62428 invoked by uid 99); 9 Apr 2009 23:19:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Apr 2009 23:19:02 +0000 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of patrick.antivackis@gmail.com designates 209.85.220.163 as permitted sender) Received: from [209.85.220.163] (HELO mail-fx0-f163.google.com) (209.85.220.163) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Apr 2009 23:18:50 +0000 Received: by fxm7 with SMTP id 7so993550fxm.11 for ; Thu, 09 Apr 2009 16:18:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=ewME5xS7InIyVW2eB3C5mULmBC4Y1SCd6ZN7B4lA0ww=; b=M9puPMnqzIg/Px182Xsn5rM1+ID72gCXJeH2eRlg+ubLmXf+pmZTR/jT/YogqjjK9m tH04c6ZMAmjkNo0OtNGEy0AdJy0REpF37rYbLXBdbjjyin/9WUptc6eGw/YhYtQvwQtR 5p17ICbA/P8+Qa3ctnbsbCfGDYyw5QIEvj2Nc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pHJ3o++pD5NL+dEuAAD/sv7Yy1Qta6yBbAGl2J0yRu8sIU6Ye3zIIi/G/CXf0eRe/z nJoc2RxsBcUg4BoAD+Zk8gYLbXLHaK1w2wLa68O80YKcMb1thzfzReygO1SUXIEzhdCT wmf36t/4kCMFjABsrePtwx8lZYRINuPY0yOKg= MIME-Version: 1.0 Received: by 10.223.108.74 with SMTP id e10mr945077fap.35.1239319110343; Thu, 09 Apr 2009 16:18:30 -0700 (PDT) In-Reply-To: References: <20090409105330.GA13320@uk.tiscali.com> <20090409210807.GA25726@uk.tiscali.com> <7060483c0904091445k625023aah29a96c493c8323b4@mail.gmail.com> <7060483c0904091539g5315acd0sbd286402a97f0b33@mail.gmail.com> Date: Fri, 10 Apr 2009 01:18:30 +0200 Message-ID: <7060483c0904091618h3ff50b6jef6548d8ce982370@mail.gmail.com> Subject: Re: View keys case-insensitive? From: Patrick Antivackis To: dev@couchdb.apache.org Content-Type: multipart/alternative; boundary=001636c5a87e55dd72046727779a X-Virus-Checked: Checked by ClamAV on apache.org --001636c5a87e55dd72046727779a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Paul, 2009/4/10 Paul Davis > I've tried various combinations of UC_CASE_LEVEL, UC_CASE_FIRST, and > UC_WEIGHT. > This is really not enough. Doing this you only try to say to the collation that a<< Also, I still don't see anything in this damned collation algorithm > that explains how A < aa. And this doesn't fall into the big/biggest > comparison. The similar case would be Big < biggest. But I don't see > anything in the damn collation algorithm document that talks about > ignoring anything after primary weight in the case that one string is > a prefix. In the various examples that I see I can't find anything > that would contradict that expectation. > > Paul Davis > > For reference, the algorithm reference I'm using is this one: > http://unicode.org/reports/tr10/ > > I feel like printing the entire thing just so I can have a book burning. > > On Thu, Apr 9, 2009 at 6:39 PM, Patrick Antivackis > wrote: > > By the way, what customization did you try to send to ICU ? > > > > 2009/4/10 Paul Davis > > > >> Patrick, > >> > >> I'm not asking for this relationship: > >> > >> a < b < A < B > >> > >> Merely: > >> > >> a < aa < A > >> > >> The thing is that even when I try and specify explicitly that 'A; > >> should come after 'a' I still can't get the expected "a < aa < A" > >> behavior. In a nutshell, "Why the hell does the second 'a' alter the > >> comparison?" > >> > >> HTH, > >> Paul Davis > >> > >> On Thu, Apr 9, 2009 at 5:45 PM, Patrick Antivackis > >> wrote: > >> > It's quite normal as far as ICU is concerned. > >> > ICU is about language not about ASCII code. > >> > In ICU, case is the third element looked for comparison (same level > than > >> > circled letter in Nordic languages for example), so not very > important. > >> > So when you sort words together, a or A is still an a, so they are > sorted > >> > nearby. In ICU you can specify if you prefer a before A or A before a, > >> but > >> > not simply a before b before c.... before A before B before C. > >> > > >> > To have such behavior (like ASCII) you need to custom ICU in > specifying > >> the > >> > collation you want almost letter by letter. > >> > It is great for you, but what about Japanese users or Arabic users ?? > >> > > >> > So this is definitely the right behaviour of ICU sorting (collation). > >> > > >> > > >> > 2009/4/9 Brian Candler > >> > > >> >> > I've spent entirely too long on this now and I still can't for the > >> >> > life of me figure out why A < aa. > >> >> > >> >> Time for an experimental, black-box approach: > >> >> > >> >> ---- > >> >> require 'rubygems' > >> >> require 'restclient' > >> >> require 'json' > >> >> > >> >> DB="http://127.0.0.1:5984/collator" > >> >> > >> >> RestClient.delete DB rescue nil > >> >> RestClient.put "#{DB}","" > >> >> > >> >> (32..126).each do |c| > >> >> RestClient.put "#{DB}/#{c.to_s(16)}", {"x"=>c.chr}.to_json > >> >> end > >> >> > >> >> RestClient.put "#{DB}/_design/test", < >> >> { > >> >> "views":{ > >> >> "one":{ > >> >> "map":"function (doc) { emit(doc.x,null); }" > >> >> } > >> >> } > >> >> } > >> >> EOS > >> >> > >> >> puts RestClient.get("#{DB}/_design/test/_view/one") > >> >> ---- > >> >> > >> >> This shows the collation sequence to be as follows. > >> >> > >> >> {"total_rows":95,"offset":0,"rows":[ > >> >> {"id":"20","key":" ","value":null}, > >> >> {"id":"60","key":"`","value":null}, > >> >> {"id":"5e","key":"^","value":null}, > >> >> {"id":"5f","key":"_","value":null}, > >> >> {"id":"2d","key":"-","value":null}, > >> >> {"id":"2c","key":",","value":null}, > >> >> {"id":"3b","key":";","value":null}, > >> >> {"id":"3a","key":":","value":null}, > >> >> {"id":"21","key":"!","value":null}, > >> >> {"id":"3f","key":"?","value":null}, > >> >> {"id":"2e","key":".","value":null}, > >> >> {"id":"27","key":"'","value":null}, > >> >> {"id":"22","key":"\"","value":null}, > >> >> {"id":"28","key":"(","value":null}, > >> >> {"id":"29","key":")","value":null}, > >> >> {"id":"5b","key":"[","value":null}, > >> >> {"id":"5d","key":"]","value":null}, > >> >> {"id":"7b","key":"{","value":null}, > >> >> {"id":"7d","key":"}","value":null}, > >> >> {"id":"40","key":"@","value":null}, > >> >> {"id":"2a","key":"*","value":null}, > >> >> {"id":"2f","key":"/","value":null}, > >> >> {"id":"5c","key":"\\","value":null}, > >> >> {"id":"26","key":"&","value":null}, > >> >> {"id":"23","key":"#","value":null}, > >> >> {"id":"25","key":"%","value":null}, > >> >> {"id":"2b","key":"+","value":null}, > >> >> {"id":"3c","key":"<","value":null}, > >> >> {"id":"3d","key":"=","value":null}, > >> >> {"id":"3e","key":">","value":null}, > >> >> {"id":"7c","key":"|","value":null}, > >> >> {"id":"7e","key":"~","value":null}, > >> >> {"id":"24","key":"$","value":null}, > >> >> {"id":"30","key":"0","value":null}, > >> >> {"id":"31","key":"1","value":null}, > >> >> {"id":"32","key":"2","value":null}, > >> >> {"id":"33","key":"3","value":null}, > >> >> {"id":"34","key":"4","value":null}, > >> >> {"id":"35","key":"5","value":null}, > >> >> {"id":"36","key":"6","value":null}, > >> >> {"id":"37","key":"7","value":null}, > >> >> {"id":"38","key":"8","value":null}, > >> >> {"id":"39","key":"9","value":null}, > >> >> {"id":"61","key":"a","value":null}, > >> >> {"id":"41","key":"A","value":null}, > >> >> {"id":"62","key":"b","value":null}, > >> >> {"id":"42","key":"B","value":null}, > >> >> {"id":"63","key":"c","value":null}, > >> >> {"id":"43","key":"C","value":null}, > >> >> {"id":"64","key":"d","value":null}, > >> >> {"id":"44","key":"D","value":null}, > >> >> {"id":"65","key":"e","value":null}, > >> >> {"id":"45","key":"E","value":null}, > >> >> {"id":"66","key":"f","value":null}, > >> >> {"id":"46","key":"F","value":null}, > >> >> {"id":"67","key":"g","value":null}, > >> >> {"id":"47","key":"G","value":null}, > >> >> {"id":"68","key":"h","value":null}, > >> >> {"id":"48","key":"H","value":null}, > >> >> {"id":"69","key":"i","value":null}, > >> >> {"id":"49","key":"I","value":null}, > >> >> {"id":"6a","key":"j","value":null}, > >> >> {"id":"4a","key":"J","value":null}, > >> >> {"id":"6b","key":"k","value":null}, > >> >> {"id":"4b","key":"K","value":null}, > >> >> {"id":"6c","key":"l","value":null}, > >> >> {"id":"4c","key":"L","value":null}, > >> >> {"id":"6d","key":"m","value":null}, > >> >> {"id":"4d","key":"M","value":null}, > >> >> {"id":"6e","key":"n","value":null}, > >> >> {"id":"4e","key":"N","value":null}, > >> >> {"id":"6f","key":"o","value":null}, > >> >> {"id":"4f","key":"O","value":null}, > >> >> {"id":"70","key":"p","value":null}, > >> >> {"id":"50","key":"P","value":null}, > >> >> {"id":"71","key":"q","value":null}, > >> >> {"id":"51","key":"Q","value":null}, > >> >> {"id":"72","key":"r","value":null}, > >> >> {"id":"52","key":"R","value":null}, > >> >> {"id":"73","key":"s","value":null}, > >> >> {"id":"53","key":"S","value":null}, > >> >> {"id":"74","key":"t","value":null}, > >> >> {"id":"54","key":"T","value":null}, > >> >> {"id":"75","key":"u","value":null}, > >> >> {"id":"55","key":"U","value":null}, > >> >> {"id":"76","key":"v","value":null}, > >> >> {"id":"56","key":"V","value":null}, > >> >> {"id":"77","key":"w","value":null}, > >> >> {"id":"57","key":"W","value":null}, > >> >> {"id":"78","key":"x","value":null}, > >> >> {"id":"58","key":"X","value":null}, > >> >> {"id":"79","key":"y","value":null}, > >> >> {"id":"59","key":"Y","value":null}, > >> >> {"id":"7a","key":"z","value":null}, > >> >> {"id":"5a","key":"Z","value":null} > >> >> ]} > >> >> > >> >> I've never seen this sequence before. It's not even EBCDIC :-) > >> >> > >> >> Adding aa into the pot gives: > >> >> > >> >> ... > >> >> {"id":"61","key":"a","value":null}, > >> >> {"id":"41","key":"A","value":null}, > >> >> {"id":"X","key":"aa","value":null}, > >> >> ... > >> >> > >> >> As you say, that is most bizarre. > >> >> > >> >> Cheers, > >> >> > >> >> Brian. > >> >> > >> > > >> > > > --001636c5a87e55dd72046727779a--