lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Snare <mikesn...@gmail.com>
Subject Re: Indexing terms only
Date Wed, 22 Dec 2004 16:49:23 GMT
I've never used the german analyzer, so I don't know what stop words
it defines/uses.  Someone else will have to answer that.  Sorry

On Wed, 22 Dec 2004 17:45:17 +0100, DES <mail@2des.de> wrote:
> I actually use Field.Text(String,String) to add documents to my index. Maybe
> I do not understand the way an analyzer works, but I thought that all German
> articles (der, die, das etc) should be filtered out. However if I use Luke
> to view my index, the original text is completely stored in a field. And
> what I need is term vector, that I can create from an indexed document
> field. So this field should contain terms only.
> 
> > Whether or not the text is stored in the index is a different concern
> > that how it is analyzed.  If you want the text to be indexed, and not
> > stored, then use the Field.Text(String, String) method or the
> > appropriate constructor when adding a field to the Document.  You'll
> > need to also store a reference to the actual file (URL, Path, etc) in
> > the document so it can be retrieved from the doc returned in the Hits
> > object.
> >
> > Or did I completely misunderstand the question?
> >
> > -Mike
> >
> > On Wed, 22 Dec 2004 17:23:24 +0100, DES <mail@2des.de> wrote:
> >> hi
> >>
> >> i need to index my text so that index contains only tokenized stemmed
> >> words without stopwords etc. The text ist german, so I tried to use
> >> GermanAnalyzer, but it stores whole text, not terms. Please give me a tip
> >> how to index terms only. Thanks!
> >>
> >> DES
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message