lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Beck (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (SOLR-1316) Create autosuggest component
Date Fri, 17 Sep 2010 20:41:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910777#action_12910777
] 

John Beck edited comment on SOLR-1316 at 9/17/10 4:41 PM:
----------------------------------------------------------

If you wish to use the comment text you have typed (shown below), please copy it now. This
text will be lost when you leave this screen.
Hey guys, really nice work on this, it is extremely fast! In production right now we're seeing
200-700ms latency, and with this I typically get between 1ms and 10ms. 

I did find one issue though. I'm going to use this with a dictionary of 150k medical terms,
and it's works, except for when my query happens to be a popular starting word. 

If I use this as a dictionary, 
{noformat} 
Hepatitis B Viruses, Duck 
Hepatitis B e Antigens 
Hepatitis B virus 
Hepatitis B, Chronic 
Hepatitis Be Antigens 
Hepatitis C 
Hepatitis C Antibodies 
Hepatitis C Antigen 
{noformat} 

And then search for Hepatitis C, 
{noformat} 
curl "http://localhost:8982/solr/suggest/?spellcheck=true&spellcheck.dictionary=suggest&spellcheck.extendedResults=true&spellcheck.count=5&q=Hepatitis%20C"

<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst><str
name="command">build</str><lst name="spellcheck">
<lst name="suggestions"><lst name="Hepatitis"><int name="numFound">5</int><int
name="startOffset">0</int><int name="endOffset">9</int>
<arr name="suggestion"><str>hepatitis b e antigens</str><str>hepatitis
b virus</str><str>hepatitis b viruses, duck</str><str>hepatitis b,
chronic</str>
<str>hepatitis be antigens</str></arr></lst></lst></lst>

</response> 
{noformat} 

You can see it never makes it to Hepatitis C since it's #6 in that dictionary, and I'm limiting
the results to 5. 

When I bump spellcheck.count=6, then I get the very first Hepatitis C result but not the rest.


So there are about 2500 terms that start with "Receptor" and I don't want to have to bump
it to 3000 results. Is there anything else that can be done?

      was (Author: johnbeck):
    If you wish to use the comment text you have typed (shown below), please copy it now.
This text will be lost when you leave this screen.
Hey guys, really nice work on this, it is extremely fast! In production right now we're seeing
200-700ms latency, and with this I typically get between 1ms and 10ms. 

I did find one issue though. I'm going to use this with a dictionary of 150k medical terms,
and it's works, except for when my query happens to be a popular starting word. 

If I use this as a dictionary, 
{noformat} 
Hepatitis B Viruses, Duck 
Hepatitis B e Antigens 
Hepatitis B virus 
Hepatitis B, Chronic 
Hepatitis Be Antigens 
Hepatitis C 
Hepatitis C Antibodies 
Hepatitis C Antigen 
{noformat} 

And then search for Hepatitis C, 
{noformat} 
curl "http://localhost:8982/solr/suggest/?spellcheck=true&spellcheck.dictionary=suggest&spellcheck.extendedResults=true&spellcheck.count=5&q=Hepatitis%20C"

<response> 
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst><str
name="command">build</str><lst name="spellcheck"><lst name="suggestions"><lst
name="Hepatitis"><int name="numFound">5</int><int name="startOffset">0</int><int
name="endOffset">9</int><arr name="suggestion"><str>hepatitis b e antigens</str><str>hepatitis
b virus</str><str>hepatitis b viruses, duck</str><str>hepatitis b,
chronic</str><str>hepatitis be antigens</str></arr></lst></lst></lst>

</response> 
{noformat} 

You can see it never makes it to Hepatitis C since it's #6 in that dictionary, and I'm limiting
the results to 5. 

When I bump spellcheck.count=6, then I get the very first Hepatitis C result but not the rest.


So there are about 2500 terms that start with "Receptor" and I don't want to have to bump
it to 3000 results. Is there anything else that can be done?
  
> Create autosuggest component
> ----------------------------
>
>                 Key: SOLR-1316
>                 URL: https://issues.apache.org/jira/browse/SOLR-1316
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Assignee: Andrzej Bialecki 
>            Priority: Minor
>             Fix For: Next
>
>         Attachments: SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch,
SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch, SOLR-1316.patch,
SOLR-1316.patch, SOLR-1316_3x-2.patch, SOLR-1316_3x.patch, suggest.patch, suggest.patch, suggest.patch,
TST.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Autosuggest is a common search function that can be integrated
> into Solr as a SearchComponent. Our first implementation will
> use the TernaryTree found in Lucene contrib. 
> * Enable creation of the dictionary from the index or via Solr's
> RPC mechanism
> * What types of parameters and settings are desirable?
> * Hopefully in the future we can include user click through
> rates to boost those terms/phrases higher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message