lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinicius Carvalho" <viniciusccarva...@gmail.com>
Subject Question about indexing (BrazilianAnalyzer)
Date Tue, 03 Jun 2008 19:51:12 GMT
Hello there! I'm indexing documents using the BrazilianAnalyzer, and I've
noticed that many words are not being indexed. I store and index the entire
doc (I'm doing this in order to present the fragments on the results, don't
know if its the best way, mostly on large docs, any ideas?). Well using luke
to check the index I open the stored doc, and its contents contains 17
occurrences of the word "herança" for instance. But, there's no term for
this word or it stemm version: "heranc", so searching for this word would
not return a result for this document.

I'm pretty sure I'm missing something on the indexing process:


try {
            doc.add(new
Field("contents",docText,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES));
            IndexWriter writer = new
IndexWriter("/java/lucene/portal/cms",new BrazilianAnalyzer()); // gotta
improve this latter
            writer.addDocument(doc);
            writer.close();
        }


So, why would these word (and others) not being indexed?

Regards
-- 
"In a world without fences and walls, who needs Gates and Windows?"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message