lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Extract terms not by reader, but by documents
Date Wed, 05 Sep 2007 02:04:23 GMT
Not sure if I am understanding what you are trying to do.  I think  
you are trying to find out which terms occurred in a particular  
document, correct?

I also am not sure about your first example.  My understanding of  
extractTerms is that it just gives you back the set of all terms that  
occur in the _query_, not necessarily those that matched in the  
document, although it has this effect for things like WildcardQuery  
and others that get expanded using TermEnum since they are expanded  
based on what is in the index.  I think this is best seen by the  
implementation of extractTerms() in in which it just  
adds the term from the query into the set.  Likewise for BooleanQuery  
which loops over the clauses and extracts the terms from each clause  
and adds them to the set.  Thus, if you had a boolean query of all  
term queries, you would get back the set of all the terms.

As for the problem it sounds like you are interested in, you could  
use SpanQuery functionality with some post processing analysis or try  
using Term Vectors and the new (unreleased) TermVectorMapper (TVM)  
functionality (or possibly a combination of both).  In this case, you  
will need to write your own implementation of the TVM that takes in  
the query so it knows what terms to identify. If you go the latter  
route, know that it is new functionality and probably doesn't have a  
whole lot of users yet, so there may still be issues with it.  See  
the nightly build or nightly javadocs for info on these.

The other question that might be helpful, is what custom highlighting  
are you doing that isn't covered by the contrib/highlighter?  Perhaps  
you have some suggestions that are generic enough to help improve  
it?  Just a thought.

Hope this helps,

On Sep 4, 2007, at 5:01 PM, Rafael Rossini wrote:

> Hi all,
>     In some custom highlighting, I often write a code like this:
>        Set<Term> matchedTerms = new HashSet<Term>();
>        query.rewrite(reader).extractTerms(matchedTerms);
>     With this code the Term Set gets populated by the matched query  
> in your
> whole index. Is it possible to this with a document instead of the  
> reader?
> Something like
> query.rewrite(documentId).extractTerms(matchedTerms) ?
> []s
>      Rossini

Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message