lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Word co-occurrences counts
Date Thu, 23 Dec 2004 07:40:01 GMT
On Thursday 23 December 2004 07:50, Andrew.Cunningham@csiro.au wrote:
> Hi all,
> 
> I have a curious problem, and initial poking around with Lucene looks
> like it may only be able to half-handle the problem.
> 
>  
> 
> The problem requires two abilities:
> 
> 1.	To be able to return the number of times the word appears in all
> the documents (which it looks like lucene can do through IndexReader) 
> 2.	To be able to return the number of word co-occurrences within
> the document set (ie. How many times does "computer" appear within 50
> words of  "dog") 
>
>  
> 
> Is the second point possible?

You can use the standard query parser with a query like this:
"dog computer"~50
This query is not completely symmetric in the distance computation:
when computer occurs before dog, the allowed distance is 49, iirc.

There is also a SpanNearQuery for more generalized and flexible
distance queries, but this is not supported by the query parser,
so you'll have to construct these queries in your own program code.

In case you have non standard retrieval requirements, eg. you only
need the number of hits and no further information from the matching
documents, you may consider using your own HitCollector on the
lower level search methods.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message