lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Jockman" <brand...@isogen.com>
Subject Re: many analyzers, same index.
Date Fri, 19 Oct 2001 16:18:03 GMT
The issue is that the set of features for queries on different types of
contextual units (used to define Lucene documents) will be different.

An example is that our XML and text documents need fuzzy-matching and porter
stemming capabilities and on others (created and maintained from metadata on
objects) we definitely don't want that.

We want to keep everything in one index so I believe that we need to
add/maintain/query document sets using separate IndexWriters, Readers, and
searching with disparate analyzers all on the same index.

Thanks again for the advice.

-Brandon Jockman

----- Original Message -----
From: "Doug Cutting" <DCutting@grandcentral.com>
To: "'Brandon Jockman'" <brandonj@isogen.com>;
<lucene-user@jakarta.apache.org>
Sent: Friday, October 19, 2001 10:45 AM
Subject: RE: many analyzers, same index.


> > From: Brandon Jockman [mailto:brandonj@isogen.com]
> >
> > Is there anything wrong with using multiple analyzers on the
> > same index, (given of course that I am keeping the set of
> > documents for each mutually exclusive)?
>
> This should work.  The primary risk is that they generate different terms
> than the analyzer you use to tokenize queries.  So long as you're aware of
> that, this shouldn't be a problem.
>
> > I am hoping to use a single Lucene index in my application,
> > but some documents need to be analyzed differently.
>
> Can you provide more details?  A common confusion is that folks try to do
> format conversion in analyzers, e.g., to have different analyzers for PDF
> files, for HTML files, etc.  The better approach is to implement
converters
> that convert these formats to plain text, either a String or a Reader.
Then
> you can use the same analyzer for documents in different formats.
>
> Doug
>


Mime
View raw message