Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CEC5B235E for ; Thu, 21 Apr 2011 15:03:17 +0000 (UTC) Received: (qmail 2410 invoked by uid 500); 21 Apr 2011 15:03:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 2364 invoked by uid 500); 21 Apr 2011 15:03:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 2355 invoked by uid 99); 21 Apr 2011 15:03:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 15:03:14 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [212.243.6.182] (HELO mail.mysigninternational.com) (212.243.6.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2011 15:03:06 +0000 Received: from localhost (localhost [127.0.0.1]) by mail.mysigninternational.com (Postfix) with ESMTP id AAAEEC2171 for ; Thu, 21 Apr 2011 17:02:46 +0200 (CEST) Received: from mail.mysigninternational.com ([127.0.0.1]) by localhost (mysign-postfix1.INTERNET [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HEVaeU1435mU for ; Thu, 21 Apr 2011 17:02:46 +0200 (CEST) Received: from Exchange2007.mysigndomain.corp (unknown [192.168.13.8]) by mail.mysigninternational.com (Postfix) with ESMTP id 841DBC2114 for ; Thu, 21 Apr 2011 17:02:46 +0200 (CEST) Received: from Exchange2007.mysigndomain.corp ([fe80::b93b:88fd:f694:bc31]) by Exchange2007.mysigndomain.corp ([fe80::b93b:88fd:f694:bc31%10]) with mapi; Thu, 21 Apr 2011 17:02:46 +0200 From: Clemens Wyss To: "java-user@lucene.apache.org" Date: Thu, 21 Apr 2011 17:02:45 +0200 Subject: "Umlaute" getting lost Thread-Topic: "Umlaute" getting lost Thread-Index: AcwANSpa8bZeAqKqTMK2253VpMDS2A== Message-ID: Accept-Language: de-DE, de-CH Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: de-DE, de-CH Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I keep my search terms in a dedicated RAMDirectory (the termIndex).=20 In there I palce all the term of my real index. When putting the terms into= the=20 termIndex I can still see [using the debugger] the Umlaute (=E4=F6=FC). Unf= ortunately when searching the=20 termIndex the documents no more contain these Umlaute. Populating the termIndex: termIndex =3D new RAMDirectory(); IndexWriterConfig config =3D new IndexWriterConfig( Version.LUCENE_31, new = TermAnalyzer( locale ) ); termIndexWriter =3D new IndexWriter( termIndex, config ); TermEnum tEnum =3D realIndexReader.terms(); while ( tEnum.next() ) { Term t =3D tEnum.term(); String termText =3D t.text(); Document termDocument =3D new Document(); Field field =3D new Field( FIELDNAME_TERM, termText, Field.Store.YES, Fiel= d.Index.ANALYZED ); termDocument.add( field ); // and add term into the index termIndexWriter.addDocument( termDocument ); } termIndexWriter.commit(); termIndexWriter.optimize(); termIndexWriter.close(); termIndexReader =3D IndexReader.open( termIndex, true ); ---------- searching terms Query q =3D fuzzy ? new FuzzyQuery( new Term( FIELDNAME_TERM, termFilter.to= LowerCase() ) ) : new WildcardQuery( new Term( FIELDNAME_TERM, "*" + termFilter.toLowerC= ase() + "*" ) ); TopDocs topDocs =3D new IndexSearcher( getTermIndexReader() ).search( q, 10= 0 ); =09 for ( ScoreDoc hit : topDocs.scoreDocs ) { Document doc =3D getTermIndexReader().document( hit.doc ); String indexTerm =3D doc.get( FIELDNAME_TERM ); if ( !returnValue.contains( indexTerm ) ) { returnValue.add( indexTerm ); } } ---------- The TermAbnalyzer is the same analyzer as the main index analyzer with the = exception that a LowerCaseFilter is applied. I have unit tests for my Umlaute which work as expected.=20 Unfortunately this is not the case when I debug my real app... What could possibly cause the "loss"? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org