lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven <sven.carlb...@gmail.com>
Subject How to get the terms within 5 words of another term?
Date Wed, 12 Nov 2008 19:26:10 GMT
Hi everyone,

I have a term "foo" and I want to count all the occurrences of all the
terms that are within 5 words of "foo" in all the documents which
contain "foo".  For simplicity sake, this is only for a single field.
So if I have 3 documents (each with a single field) that look like this:

Once upon a time, foo lived far, far away in a magical kingdom.

"The Life and Time of the Hero Called Foo" is, by far, the best novel
about spam I have ever read.

I theorize that over time, foo will gradually move far away from bar.

I would like to generate a list of terms and hits based on their
proximity to "foo" in all the documents.  So I'll end up with something
like:

far : 4
time : 3
away : 2

Any help would be greatly appreciated.

Thanks much!
-Sven



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message