lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucene User <lucene.u...@googlemail.com>
Subject Re: Searching Special Characters
Date Wed, 16 Nov 2005 17:05:08 GMT
As we have a very large index, I'm interested in knowing what others
do, before I commit to doing the below.

If I do go down that route, I assume I use a StandardAnalyzer once again?

In a Test, I did the following...

public class TestLuceneIndexCreateAndIndex extends TestCase {
    public void index() throws IOException {
        String indexName = "c:\\lucene\\test";
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriter writer = new IndexWriter(indexName, analyzer, true);
        Document d = new Document();
        d.add(new Field("headline", "& > < ´ ¸ ˆ ¯ · ˜ ¨ Á á Â â Æ æ
À
à Å å Ã ã Ä ä Ç ç É é Ê ê È è Ð ð Ë ë Í í Î î Ì ì Ï ï Ñ ñ Ó
ó Ô ô Œ œ
Ò ò Ø ø Õ õ Ö ö Š š ß Þ þ Ú ú Û û Ù ù Ü ü Ý ý ÿ Ÿ", true, true,
true));
        writer.addDocument(d);
        writer.close();
        IndexReader reader = IndexReader.open(indexName);
        assertTrue(reader.numDocs()>0);
    }
}

Using luke I searched for headline:Ê which corrceted returned the
article.  However, when I did headline:& it returned nothing which I
didn't expect.

Thanks

On 15/11/05, Daniel Noll <daniel@nuix.com.au> wrote:
> Mordo, Aviran (EXP N-NANNATEK) wrote:
>
> >You can use your own Analyzer to support special characters. Just
> >process the special characters in your analyzer
> >
> >
> That's one option.  The "correct" solution would be, since this is
> presumably HTML or XML, replacing entities with their actual string
> values before analysing the text.
>
> Daniel
>
> --
> Daniel Noll
>
> NUIX Pty Ltd
> Level 8, 143 York Street, Sydney 2000
> Phone: (02) 9283 9010
> Fax:   (02) 9283 9020
>
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Mime
View raw message