lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DES" <m...@2des.de>
Subject Re: Indexing terms only
Date Wed, 22 Dec 2004 16:45:17 GMT
I actually use Field.Text(String,String) to add documents to my index. Maybe 
I do not understand the way an analyzer works, but I thought that all German 
articles (der, die, das etc) should be filtered out. However if I use Luke 
to view my index, the original text is completely stored in a field. And 
what I need is term vector, that I can create from an indexed document 
field. So this field should contain terms only.

> Whether or not the text is stored in the index is a different concern
> that how it is analyzed.  If you want the text to be indexed, and not
> stored, then use the Field.Text(String, String) method or the
> appropriate constructor when adding a field to the Document.  You'll
> need to also store a reference to the actual file (URL, Path, etc) in
> the document so it can be retrieved from the doc returned in the Hits
> object.
>
> Or did I completely misunderstand the question?
>
> -Mike
>
> On Wed, 22 Dec 2004 17:23:24 +0100, DES <mail@2des.de> wrote:
>> hi
>>
>> i need to index my text so that index contains only tokenized stemmed 
>> words without stopwords etc. The text ist german, so I tried to use 
>> GermanAnalyzer, but it stores whole text, not terms. Please give me a tip 
>> how to index terms only. Thanks!
>>
>> DES
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message