lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Cunningham <cun...@csiro.au>
Subject Re: Word co-occurrences counts
Date Fri, 24 Dec 2004 00:17:26 GMT
"computer dog"~50 looks like what I'm after - now is there someway I can 
call this and pull
out the number of total occurances, not just the number of documents 
hits? (say if computer
and dog occur near each other several times in the same document).

Paul Elschot wrote:

>On Thursday 23 December 2004 07:50, Andrew.Cunningham@csiro.au wrote:
>  
>
>>Hi all,
>>
>>I have a curious problem, and initial poking around with Lucene looks
>>like it may only be able to half-handle the problem.
>>
>> 
>>
>>The problem requires two abilities:
>>
>>1.	To be able to return the number of times the word appears in all
>>the documents (which it looks like lucene can do through IndexReader) 
>>2.	To be able to return the number of word co-occurrences within
>>the document set (ie. How many times does "computer" appear within 50
>>words of  "dog") 
>>
>> 
>>
>>Is the second point possible?
>>    
>>
>
>You can use the standard query parser with a query like this:
>"dog computer"~50
>This query is not completely symmetric in the distance computation:
>when computer occurs before dog, the allowed distance is 49, iirc.
>
>There is also a SpanNearQuery for more generalized and flexible
>distance queries, but this is not supported by the query parser,
>so you'll have to construct these queries in your own program code.
>
>In case you have non standard retrieval requirements, eg. you only
>need the number of hits and no further information from the matching
>documents, you may consider using your own HitCollector on the
>lower level search methods.
>
>Regards,
>Paul Elschot
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message