Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 43302 invoked from network); 8 Apr 2009 07:42:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Apr 2009 07:42:19 -0000 Received: (qmail 2639 invoked by uid 500); 8 Apr 2009 07:42:17 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 2557 invoked by uid 500); 8 Apr 2009 07:42:17 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 2549 invoked by uid 99); 8 Apr 2009 07:42:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 07:42:17 +0000 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=HTML_MESSAGE,SPF_HELO_PASS,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ffalini@datamanagement.it designates 193.109.105.31 as permitted sender) Received: from [193.109.105.31] (HELO mx1ra.datamanagement.it) (193.109.105.31) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2009 07:42:07 +0000 Received: (qmail 14330 invoked from network); 8 Apr 2009 07:41:46 -0000 Received: from localhost.datamanagement.it (HELO mx1ra.datamanagement.it) ([127.0.0.1]) (envelope-sender ) by 0 (qmail-ldap-1.03) with SMTP for ; 8 Apr 2009 07:41:46 -0000 Received: (qmail 14318 invoked from network); 8 Apr 2009 07:41:45 -0000 Received: from falfe-n.ubc.datamanagement.it (HELO [172.25.3.166]) (DMMx_AUTH=ffalini@datamanagement.it@[172.25.3.166]) (envelope-sender ) by 0 (qmail-ldap-1.03) with SMTP for ; 8 Apr 2009 07:41:45 -0000 Message-ID: <49DC5566.6020403@datamanagement.it> Date: Wed, 08 Apr 2009 09:42:30 +0200 From: "Federica Falini Data Management S.p.A" User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: it, it-it, en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Probelm sort on TermEnum References: <49DB3AA3.9020705@datamanagement.it> <9ac0c6aa0904071019u13ac7e16s5be8597bca68b839@mail.gmail.com> <2D127F11DC79714E9B6A43AC9458147F13FCA368@suex07-mbx-03.ad.syr.edu> In-Reply-To: <2D127F11DC79714E9B6A43AC9458147F13FCA368@suex07-mbx-03.ad.syr.edu> Content-Type: multipart/alternative; boundary="------------030601090105050304070507" X-Virus-Checked: Checked by ClamAV on apache.org --------------030601090105050304070507 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Steve, in fact the list of terms returned is for user consumption. From every term is possible with a link to activate a search on the term itself and access to document. Annales cafe Caf� zucche Thanks Federica Steven A Rowe ha scritto: >On 4/7/2009 at 1:19 PM, Michael McCandless wrote: > > >>I think the new contrib/collation package may address this use case? >>It converts each term to its CollationKey, outside of Lucene. >> >> > >Since AFAIK CollationKey creation is a one-way process, CollationKeyFilter may not be useful for Federica. > >Federica, what use do you make of the terms returned by reader.terms()? I ask because the new CollationKeyFilter would produce terms that would not be suitable for human consumption, but might be useful for other purposes. > >Steve > > > >>On Tue, Apr 7, 2009 at 7:36 AM, Federica Falini Data Management S.p.A >> wrote: >> >> >>>Good morning, >>>In Lucene 2.2 i have made modification to Term.java, TermBuffer.java >>>(see below) in order to have Term enumerations sorted case-insensitive >>>(when a field is not-tokenized): >>>TermEnum terms = reader.terms(new Term("myFieldNotTokenized", "")); >>> while ("myFieldNotTokenized".equals(terms.term().field())) { >>> >>> System.out.println( " " + terms.term()); >>> if (!terms.next()) break; >>> } >>> >>>For example, instead to obtain this sort on TermEnum: >>> >>>Annales >>>Caf� >>>Zucche >>>cafe >>> >>>i need to obtain this : >>> >>>Annales >>>cafe >>>Caf� >>>Zucche >>> >>>Now in Lucene 2.4 i find it difficult because the package "index" is >>>changed a lot; can i have some indications to keep my sort? >>>Thanks in advance >>>Federica >>> >>> > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > > -- *Federica FALINI * Divisione Beni Culturali Data Management S.p.A. tel: +39.0544.503.*886* fax: +39.0544.461697 e-mail: ffalini@datamanagement.it web: http://www.datamanagement.it 48100 - Ravenna (RA) Via S.Cavina, n 7 Italy ------------------------------------------------------------------------ Questo messaggio di posta elettronica contiene informazioni di carattere confidenziale rivolte esclusivamente al destinatario sopra indicato. E' vietato l'uso, la diffusione, distribuzione o riproduzione da parte di ogni altra persona. Nel caso aveste ricevuto questo messaggio di posta elettronica per errore, siete pregati di segnalarlo immediatamente al mittente e distruggere quanto ricevuto (compresi i file allegati) senza farne copia. /This e-mail transmission may contain legally privileged and/or confidential information. Please do not read it if you are not the intended recipient(S). Any use, distribution, reproduction or disclosure by any other person is strictly prohibited. If you have received this e-mail in error, please notify the sender and destroy the original transmission and its attachments without reading or saving it in any manner./ --------------030601090105050304070507 Content-Type: multipart/related; boundary="------------070706060504020409070409" --------------070706060504020409070409 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Steve, in fact the list of terms returned is for user consumption. From every term is possible with a link to activate a search on the term itself and access to document.

Annales
cafe
Cafè
zucche

Thanks
Federica

Steven A Rowe ha scritto:
On 4/7/2009 at 1:19 PM, Michael McCandless wrote:
  
I think the new contrib/collation package may address this use case?
It converts each term to its CollationKey, outside of Lucene.
    

Since AFAIK CollationKey creation is a one-way process, CollationKeyFilter may not be useful for Federica.

Federica, what use do you make of the terms returned by reader.terms()?  I ask because the new CollationKeyFilter would produce terms that would not be suitable for human consumption, but might be useful for other purposes.

Steve

  
On Tue, Apr 7, 2009 at 7:36 AM, Federica Falini Data Management S.p.A
<ffalini@datamanagement.it> wrote:
    
Good morning,
In Lucene 2.2 i have made modification to Term.java, TermBuffer.java
(see below)  in order to have  Term enumerations sorted case-insensitive
(when a field is not-tokenized):
TermEnum terms = reader.terms(new Term("myFieldNotTokenized", ""));
      while ("myFieldNotTokenized".equals(terms.term().field())) {

        System.out.println( "     " + terms.term());
        if (!terms.next()) break;
  }

For example, instead to obtain this sort on TermEnum:

Annales
Cafè
Zucche
cafe

i need to obtain this :

Annales
cafe
Cafè
Zucche

Now in Lucene 2.4 i find it difficult because the package "index" is
changed a lot; can i have some indications to keep my sort?
Thanks in advance
Federica
      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



  

--
Firma
 
 
Federica FALINI
Divisione Beni Culturali
 
 

tel: +39.0544.503.886
fax: +39.0544.461697
e-mail: ffalini@datamanagement.it
 
48100 Ravenna (RA)
Via S.Cavina, n 7 
Italy

Questo messaggio di posta elettronica contiene informazioni di carattere confidenziale rivolte esclusivamente al destinatario sopra indicato. E' vietato l'uso, la diffusione, distribuzione o riproduzione da parte di ogni altra persona. Nel caso aveste ricevuto questo messaggio di posta elettronica per errore, siete pregati di segnalarlo immediatamente al mittente e distruggere quanto ricevuto (compresi i file allegati) senza farne copia. This e-mail transmission may contain legally privileged and/or confidential information. Please do not read it if you are not the intended recipient(S). Any use, distribution, reproduction or disclosure by any other person is strictly prohibited. If you have received this e-mail in error, please notify the sender and destroy the original transmission and its attachments without reading or saving it in any manner.
--------------070706060504020409070409 Content-Type: image/gif; name="dm_logo_email.gif" Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="dm_logo_email.gif" R0lGODlhngAiAMQfAMfV5ePq8rLF3EZ0qvv8/drj7qa81jprpYmmyWWLuPH0+SZbm+vw9l2F tX2dw9Dc6Z620gZDjRxUl73N4XKVvmuQuxNNk1V/sZGtzfb4+yxgngxIj056rpix0AA/iv// /yH5BAEAAB8ALAAAAACeACIAQAX/4Pdx10dUinkhH8MRLdVkz5EWkmU1Yu//wKBwSCwaj8jk kKCQcAKMjkhhoRAKDlFhA/k8Ip6wmKMsm8/odNkSHjwKAibAUwk8BKK5eD/2EWBqS4BGGYGG hx9sfIuMjR5kfgsbDD8BGx4ADx8ADh4JDB4UHwUMCgIXHgY/F3gYnh8KGgUfDZgBARkCtROg oiZOCgUEEwkeGAF0IhkWGhYcFQkJAwMIqVMRFT0aCwsH0AkXBwYaBz0DEgQSA4js7e4iBxEO EwUPHRea70YTHQAZCwBaCHhA4MGGgAEmKMgAwYOmLXsg6ZtI8R2FAQ0E2IBFoUKHB+s+BDBg wA4YRhIr/6pciUYRowgcEuQ56ejRDwKSKPkYEKpHChEBFmQTwURIAwoeQvaoscvnTSE//ej0 o48AB6U9LtXcugirCQsRAgB1pUETz4GdBjTkMMHDgQITOISK00NcD6T5RniAwOzBBFQYCrwS YeFAMwAAiiXoxMGAAAETAnTwYAGBgaNtu8DywEPEAg0AwFAw4GBABgka6lr4sEDCAwMTehC4 MIuAAysB3rZA0EEBAA2FAggwUMAAzZQskysnQomBAAWlNkgRLuKBtdcmqvFBvry79wK+Phw4 ICBAZwYY7ZTYRDOi9/fvYyvAAAGDAQsIYUPeMMu6I+7wBUiRS2FEcIEBUowSwf8CELS1FXKW eDDQazwNMIsQBRkRi1uDLBVGggKeIdYPBDrSXk3I4TRJJRFEMBkEBBjggQPWNVDAABYwkIFc F/ZwAQACeLCeSMAV84ADYWVQiwEVuFUBBLVAIKM/1eCRwAIpUBABBTUAwIArB/DUgAMG8ORA AxgU40AG4jDQ1gJILZAAAkjZKMEBb0RVIlcP3pTTD8VUAMAGqijgSS+0aCBWkKr4wIoI1VSA pwi1cLLALG3x0tMvF0BwkBcesIBDAxAMwOWmRMnYqALYaJNDObIFcKc56KgDxB+45qrrrrzi 6gMDASgA3S1R9cBAAYVAB1QhBYyIi0/BAtshLFBAB+z/hcCWAmwG0V4rAgOUUABAAPUAgEA5 AWRCbbYKQLGtu2IxkAkM8HILbgBXjOgDYik4IIEIAzHBwWofGJABDRpsAAMqYQAY4sOIyKiJ AQF90IlYGAQ0mwYwJKAJAwu4B/HI+kxggSr0iQCBLB9U0CgEFEDBcB8k12yIxxCw8AExGBBA gc4CdIAsBzoXs53NSKvxmgFHrjcBAMI0sJ4AOhLA0yIOJ631EBRYMAtx4FIgASUG/MQBcHqh tPXaSlQTcwcjIgVBBjGLIFcjWbOt9wYXCKqyOonh8QEGBzigwMxq6604EC5FIIHOBnRUqs6I Jw5wAuOFOcAF8xgCAAXjxRaE/6QcYFDI4kPsGcYBAU1YA1fcSXK6CJ1oMNUPAgxVRAOXiO5D ApdUjLpIuveg+h4R2Hh1n5Gs6EOQ/+rSwAAOFHCABxpQIAACDUwvuKMTXC/8Bwlg0OQEBGDQ /QBl6xIYB3DCAOX0GBCFQAUVsGUHePQ7d8EEDzgbCx5wgewVggEOwBwFFHCkCjCAAhpIQACK ASehGY9PGLRJ827XAjbwaDMo6Ek3ACahVaBvARG4UEdogYkLLEAEmTLUAGDAqgRUIAIpsA4C DDUUwVCAc0w4jYtmpAW3nO5ODniBAqwmodaMiBsfcALjMsinFP1pXx74VwE44IAgfWJTHeAA BCbzPf8RPKoJG1DA/SjlgdiUqQMNicym0rGOuFgmVATYRnWEJCFffYBR19CdBiSwAQT4MYqw +sA56DhFKjJPNpIoVlvGhqNvecJQDgCFZoJUxg88SiRs6Az5UpGMnUnIUL6goxRBxQIM0Cgy IiBkXhBQjVW1SgQa0ABPGvWBDhhghCJY5Cp9YJhcGvOYyEymMos3AbHdqXvj6cDpJiCBCBxg AjjqQAU2cIAGRMACFRBAX6qDgAWwL4edKUA5JXCBAXwzASbrxp3KgwDUcGADG2DLAk5mNDFE QABIakMBGuCMBwTAAcx4DgQkIAEEKeJSfFmAARjwSwtAAEkSgAAHh7ec7v0QoAAy4ahIy2Ae QqLGRvAJAQA7 --------------070706060504020409070409-- --------------030601090105050304070507--