lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lucas F. A. Teixeira" <lucas.teixe...@accurate.com.br>
Subject Re: Question about indexing (BrazilianAnalyzer)
Date Wed, 04 Jun 2008 12:58:58 GMT
Are you using ISOLatin1AccentFilter ?

[]s,

Lucas Frare A. Teixeira
lucas.teixeira@accurate.com.br <mailto:lucas.teixeira@accurate.com.br>
Tel: +55 11 3660.1622 - R3018



Vinicius Carvalho escreveu:
> Hello there! I'm indexing documents using the BrazilianAnalyzer, and I've
> noticed that many words are not being indexed. I store and index the entire
> doc (I'm doing this in order to present the fragments on the results, don't
> know if its the best way, mostly on large docs, any ideas?). Well using luke
> to check the index I open the stored doc, and its contents contains 17
> occurrences of the word "herança" for instance. But, there's no term for
> this word or it stemm version: "heranc", so searching for this word would
> not return a result for this document.
>
> I'm pretty sure I'm missing something on the indexing process:
>
>
> try {
>             doc.add(new
> Field("contents",docText,Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES));
>             IndexWriter writer = new
> IndexWriter("/java/lucene/portal/cms",new BrazilianAnalyzer()); // gotta
> improve this latter
>             writer.addDocument(doc);
>             writer.close();
>         }
>
>
> So, why would these word (and others) not being indexed?
>
> Regards
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message