lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Krämer <>
Subject build a case insensitive index
Date Thu, 11 Dec 2003 21:00:56 GMT
Hello Lucene Users

i need a document term matrix to initialize a neural network, that i 
want to use to integrate user feedback in the retrieval process.

until now, i am using a slightly modified class of the IndexHTML example.

how can i create an index of all the terms in a collection without 
"term" and "Term" being indexed twice?

in the example, a standard analyzer is used, and in the documentation it 
sais :

Filters StandardTokenizer with StandardFilter, LowerCaseFilter and 

So, why do i get double entries for terms in upper- and lower case writing?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message