lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Harwood (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-474) High Frequency Terms/Phrases at the Index level
Date Mon, 28 Nov 2005 20:00:37 GMT
     [ http://issues.apache.org/jira/browse/LUCENE-474?page=all ]

Mark Harwood updated LUCENE-474:
--------------------------------

    Attachment: colloc.zip

Here's some code that I've used before to find phrases in an index - see CollocationFinder.java.
If your index has termvector support enabled you can run it to mine the collocated terms.
This is typically a long operation that you dont want to do too often.
The CollocationIndexer can be used to store the mined collocations in an index.

Possible uses for collocations are:
* automatically identifying candidate terms in a query that can be turned into a phrase query
* better spelling correction by using all terms in query as context to pick the most likely
spelling variation 

Haven't done too much with this code but I've added it here because it sounds like it could
be relevant

Cheers
Mark



> High Frequency Terms/Phrases at the Index level
> -----------------------------------------------
>
>          Key: LUCENE-474
>          URL: http://issues.apache.org/jira/browse/LUCENE-474
>      Project: Lucene - Java
>         Type: New Feature
>     Versions: 1.4
>     Reporter: Suri Babu B
>  Attachments: colloc.zip
>
> We should be able to find the all the high frequncy terms/phrases ( where frequency 
is the search criteria / benchmark)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message