lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <ch...@manawiz.com>
Subject Re: Ferret's changes
Date Wed, 11 Oct 2006 09:31:37 GMT

David Balmain wrote on 10/10/2006 08:53 PM:
> On 10/11/06, Chuck Williams <chuck@manawiz.com> wrote:
>
> I personally would always store term vectors since I use a
> StandardTokenizer and Stemming. In this case highlighting matches in
> small documents is not trivial. Ferret's highlighter matches even
> sloppy phrase queries and phrases with gaps between the terms
> correctly. I couldn't do this without the use of term vectors.

I use stemming as well, but am not yet matching phrases like that. 
Perhaps term vectors will be useful to achieve this, although they come
at a high cost and it doesn't seem difficult or expensive to do the
matching directly on the text of small items.

>> I suppose it would be possible for the single conceptual field 'body' to
>> be represented with two physical fields 'smallBody' and 'largeBody'
>> where the former stores term vectors and the latter does not.
>
> If I really wanted to solve this problem I would use this solution. It
> is pretty easy to search multiple fields when I need to. Ferret's
> Query language even supports it:
>
>    smallBody|largeBody:"phrase to search for"

Couldn't agree more.  I have a number of extensions to Lucene's query
parser, including this for multiple fields:

{smallBody largeBody}:"phrase to search for"

>
> In the end, I think the benifits of my model far outweight the costs.
> For me at least anyway.

Based on the performance figures so far, it seems they do!  I think
dynamic term vectors have a substantial benefit, but can easily be
implemented in model where all field indexing properties are fixed.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message