lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Next Word - Any Suggestions?
Date Tue, 26 Oct 2010 12:53:19 GMT
On Tue, 2010-10-26 at 14:27 +0200, Lucene wrote:
> In simple words, I need facet on the next word given a target word.

How large do you expect the documents to be and how many hits do you
need to process? If both are low, this seems to be a fairly straight
forward iteration with a HashMap<String, Integer> to collect the counts.

If you're scaling up, you could make a field with all bi-grams and do
prefix faceting on that. 

> Long-term, do the opposite - frequency of the distinct terms before the word
> "fox":

By remembering the previous term, the iteration method should be about
the same as above. For the faceting method, just reverse the order in
the bi-grams.

Regards,
Toke Eskildsen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message