lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wettin (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-626) Extended spell checker with phrase support and adaptive user session analysis.
Date Tue, 23 Oct 2007 20:18:52 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karl Wettin updated LUCENE-626:
-------------------------------

    Attachment: LUCENE-626_20071023.txt

In this patch:

 * Updated package javadoc

 * Simplified consumer interface with persistent session management :

{code:java}
SuggestionFacade facade = new SuggestionFacade(new File("data"));
facade.getDictionary().getPrioritesBySecondLevelSuggester().putAll(facade.secondLevelSuggestionFactory());
...
QuerySession session = facade.getQuerySessionManager().sessionFactory();
...
String query = "heros of mght and magik";
Hits hits = searcher.search(queryFactory(query));
String suggested = facade.didYouMean(query);
session.query(query, hits.length(), suggested);
...
facade.getQuerySessionManager().getSessionsByID().put(session);
...
facade.trainExpiredSessions();
...
facade.close();
{code}

 * Optimizations. On my MacBook it now takes five minutes for the big unit test to process
3,500,000 queries: training the dictionary and extracts an a priori corpus by inverting the
dictionary of the phrases and words people have most problem spelling.

 * Depends on LUCENE-550 by default again. When it took 30 seconds to execute 100,000 span
near queries in a RAMDirectory and less than one second to do the same witn an InstantiatedIndex
it simply did not make sense to use RAMDirectory as default. Replacing one line of code removes
the dependency to InstantiatedIndex.

 * New algorithmic second level suggester for queries containing terms not close enough in
the text to be found in the a priori. Added with lowest priority and checks against the system
index rather than the a priori index. Soon the second level suggster classes will needs a
bit of refactoring.




> Extended spell checker with phrase support and adaptive user session analysis.
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-626
>                 URL: https://issues.apache.org/jira/browse/LUCENE-626
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Karl Wettin
>            Priority: Minor
>         Attachments: LUCENE-626_20071023.txt
>
>
> Extensive javadocs available in patch, but I also try to keep it compiled here: http://ginandtonique.org/~kalle/javadocs/didyoumean/org/apache/lucene/search/didyoumean/package-summary.html#package_description
> A semi-retarded reinforcement learning thingy backed by algorithmic second level suggestion
schemes that learns from and adapts to user behavior as queries change, suggestions are accepted
or declined, etc.
> Except for detecting spelling errors it considers context, composition/decomposition
and a few other things.
> heroes of light and magik -> heroes of might and magic
> vinci da code -> da vinci code
> java docs -> javadocs
> blacksabbath -> black sabbath
> Depends on LUCENE-550

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message