jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julio Castillo" <jcasti...@edgenuity.com>
Subject RE: Excluding words
Date Sat, 18 Oct 2008 19:56:35 GMT
Hi there,
Unfortunately there was no response to my previous posting.

I am still looking for sample configuration specifications that would allow
me to specify a lucene stop word analyzer.

Anybody has a sample repository config file where they have referenced a
stopwords.txt type file?

Thanks

** julio

-----Original Message-----
From: Julio Castillo [mailto:jcastillo@edgenuity.com] 
Sent: Wednesday, October 15, 2008 9:30 AM
To: 'users@jackrabbit.apache.org'
Subject: RE: Excluding words

Thanks Ard,
Let me see if I understood you, as the link doesn't exactly show how, but I
will guess. Currently my repository.xml has the following entry:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
  <param name="path" value="${wsp.home}/index"/>
  <param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.MsWordTextExtractor,...<list
truncated>.."/>
  <param name="extractorPoolSize " value="2"/>
  <param name="supportHighlighting" value="true"/> </SearchIndex>

I saw an example for synonyms, so I imagine it would look like this (I just
need the actual correct parameter names)?

  <param name="stopWordAnalyzerClass"
value="org.apache.lucene.analysis.StopAnalyzer"/>
  <param name="stopWordAnalyzerConfigPath" value="../stopwords.txt"/>

Thanks

** julio

-----Original Message-----
From: Ard Schrijvers [mailto:a.schrijvers@onehippo.com]
Sent: Wednesday, October 15, 2008 4:39 AM
To: users@jackrabbit.apache.org
Subject: RE: Excluding words

Hello Julio,

You can define your own lucene analyzer in Jackrabbit (even per property,
see [1] at the bottom). If you just configure a lucene analyzer having a
list of stopwords for example, where you create the list yourself, you are
done.

Regards Ard

[1] http://wiki.apache.org/jackrabbit/IndexingConfiguration

> 
> Is there a way to perhaps on a per node insertion basis exclude words 
> from being indexed by Lucene?
> 
> I have to load a large volume of documents. There are certain words 
> that I want to exclude as they will be present in 99% of the 
> documents, but I haven't found a way to access or restrict Lucene to 
> prevent it from indexing such words.
> 
> Any ideas?
> 
> Julio Castillo
> Edgenuity Inc.
> 
> 


Mime
View raw message