Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2F0CD7F0 for ; Tue, 11 Sep 2012 15:23:56 +0000 (UTC) Received: (qmail 26092 invoked by uid 500); 11 Sep 2012 15:23:55 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 26043 invoked by uid 500); 11 Sep 2012 15:23:55 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 26036 invoked by uid 99); 11 Sep 2012 15:23:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 15:23:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Sep 2012 15:23:50 +0000 Received: by iagt4 with SMTP id t4so558053iag.35 for ; Tue, 11 Sep 2012 08:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=uno7leH66zxPfVVnNvI84WyMlu/WlVJ09Z4YK40t1bE=; b=RdOJujm0Wd9P9dHcztS+j4eouX9Jx9fLeraMAC9IvUSXtxBkXJzOhW9QgmWLtRheUi bOJeNYgR1DkzzxmV6bH5ByhHy6S5CiOq+q1WIQ2QsbqEMgfNdxZrwNbWq/sacNx99a69 PHypIiexj1P+HJLOkYGuVjgTwk84VQ774qYB7OI0Kx1UeVvelFxoTbJfd6Uz1qzqnYhS 83Nw2hwcIMafe8ZrffR2HLpcpsX3TrbCF6lKGPi7bXJIUAuKHf+srFb6KoYKw2TKIH6h N0mdAQtGCVi2FSe2mUvGwjZ1dKqIyBxqjl42PQhnkHDRk5EE6nEEOVQKOnnD3MfDTPhE Swlw== Received: by 10.50.195.232 with SMTP id ih8mr13527836igc.45.1347377010248; Tue, 11 Sep 2012 08:23:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.47.129 with HTTP; Tue, 11 Sep 2012 08:23:10 -0700 (PDT) In-Reply-To: <1347374597.2951.345.camel@te-prime> References: <1347374597.2951.345.camel@te-prime> From: Robert Muir Date: Tue, 11 Sep 2012 11:23:10 -0400 Message-ID: Subject: Re: Collator-based facet sorting in Solr To: dev@lucene.apache.org, te@statsbiblioteket.dk Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Just a concern where things could act a little funky: today for example, If I set strength=primary, then its going to fold Test and test to the same unique term, but under this scheme you would have Test and test as two terms. this could be undesirable in the typical case that you just want case-insensitive facets: but we don't provide any way to preprocess the text to avoid this. Really a lot of this is because factory-based analysis chains have no way to specify the AttributeFactory, e.g. i guess if we really wanted to fix this right we would need to pass in the AttributeFactory to TokenizerFactory's create() method. But for now from Solr it would be a little hacky, e.g. someone is gonna have to fold the case client-side or whatever if they don't want these problems. On Tue, Sep 11, 2012 at 10:43 AM, Toke Eskildsen wrote: > Claudio Ranieri and I briefly discussed collator based sorting for > facets in the thread "Problem with accented words sorting" on the > solr-user mailing list. Here's the idea: > > Solr faceting supports sorting by either count or index order. Claudio > and I both need the order to be collator-based. My understanding of the > issue is that it is not currently possible. > > Collator-based document sorting in Solr uses CollationKeys as field > values. This does not work with faceting on fields with multiple values > as there is no mapping from the key to the human readable value. > > ICU sort keys are always null (00) terminated and when two keys are > compared, the comparison stops as soon as null is reached(?) > http://userguide.icu-project.org/collation/architecture > > If we concatenate the keys with the original values: > <00> > we get an entity where the ordering is still correct upon comparison and > where the original value can be extracted by using the offset from the > last int (or maybe short, to spare 2 bytes) in the BytesRef. > > If the idea is sound, I'll open a JIRA issue. Unfortunately I do not > have time right now for hacking on it. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > -- lucidworks.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org