jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Excluding words
Date Wed, 22 Oct 2008 12:06:38 GMT
Hi,

there parameter that allows you to configure a custom analyzer is called
'analyzer'. the default value for this parameter is
org.apache.lucene.analysis.standard.StandardAnalyzer. so, you just have to write
your own implementation that supports stop words and then configure it properly
in your workspace.xml files.

see also: http://wiki.apache.org/jackrabbit/Search

regards
 marcel

Julio Castillo wrote:
> Hi there,
> Unfortunately there was no response to my previous posting.
> 
> I am still looking for sample configuration specifications that would allow
> me to specify a lucene stop word analyzer.
> 
> Anybody has a sample repository config file where they have referenced a
> stopwords.txt type file?
> 
> Thanks
> 
> ** julio
> 
> -----Original Message-----
> From: Julio Castillo [mailto:jcastillo@edgenuity.com] 
> Sent: Wednesday, October 15, 2008 9:30 AM
> To: 'users@jackrabbit.apache.org'
> Subject: RE: Excluding words
> 
> Thanks Ard,
> Let me see if I understood you, as the link doesn't exactly show how, but I
> will guess. Currently my repository.xml has the following entry:
> 
> <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
>   <param name="path" value="${wsp.home}/index"/>
>   <param name="textFilterClasses"
> value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
> truncated>.."/>
>   <param name="extractorPoolSize " value="2"/>
>   <param name="supportHighlighting" value="true"/> </SearchIndex>
> 
> I saw an example for synonyms, so I imagine it would look like this (I just
> need the actual correct parameter names)?
> 
>   <param name="stopWordAnalyzerClass"
> value="org.apache.lucene.analysis.StopAnalyzer"/>
>   <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>
> 
> Thanks
> 
> ** julio
> 
> -----Original Message-----
> From: Ard Schrijvers [mailto:a.schrijvers@onehippo.com]
> Sent: Wednesday, October 15, 2008 4:39 AM
> To: users@jackrabbit.apache.org
> Subject: RE: Excluding words
> 
> Hello Julio,
> 
> You can define your own lucene analyzer in Jackrabbit (even per property,
> see [1] at the bottom). If you just configure a lucene analyzer having a
> list of stopwords for example, where you create the list yourself, you are
> done.
> 
> Regards Ard
> 
> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration
> 
>> Is there a way to perhaps on a per node insertion basis exclude words 
>> from being indexed by Lucene?
>>
>> I have to load a large volume of documents. There are certain words 
>> that I want to exclude as they will be present in 99% of the 
>> documents, but I haven't found a way to access or restrict Lucene to 
>> prevent it from indexing such words.
>>
>> Any ideas?
>>
>> Julio Castillo
>> Edgenuity Inc.
>>
>>
> 
> 


Mime
View raw message