lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Gui (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-9458) DocumentDictionaryFactory StackOverflowError on many documents
Date Thu, 31 Aug 2017 21:14:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149609#comment-16149609
] 

Erik Gui edited comment on SOLR-9458 at 8/31/17 9:13 PM:
---------------------------------------------------------

I am also having this issue trying to build the dictionary on the "name" field on an index
with around 45 million documents. If I change the field to be another field that's used for
faceting ("type"), then the dictionary seems to be buildable after a long time. For reference
my suggester config looks like this:

{code:java}
  <searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">name</str>
      <str name="suggestAnalyzerFieldType">text_general</str>
      <str name="buildOnStartup">false</str>
    </lst>
  </searchComponent>
{code}

My current workaround involves using HighFrequencyDictionaryFactory with FreeTextLookupFactory,
but the suggestion results are not what I would like to see at all.

{code:xml}
  <searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FreeTextLookupFactory</str>
      <str name="storeDir">suggester_fuzzy_dir</str>
      <str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
      <str name="field">name</str>
      <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
    </lst>
  </searchComponent>
{code}



was (Author: erik):
I am also having this issue trying to build the dictionary on the "name" field on an index
with around 45 million documents. If I change the field to be another field that's used for
faceting ("type"), then the dictionary seems to be buildable after a long time. For reference
my suggester config looks like this:

{code:java}
  <searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">name</str>
      <str name="suggestAnalyzerFieldType">text_general</str>
      <str name="buildOnStartup">false</str>
    </lst>
  </searchComponent>
{code}

My current workaround involves using HighFrequencyDictionaryFactory with FreeTextLookupFactory,
but the suggestion results are not what I would like to see at all.

{code:xml}
  <searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FreeTextLookupFactory</str>
      <str name="storeDir">suggester_free_dir</str>
      <str name="dictionaryImpl">HighFrequencyDictionaryFactory</str>
      <str name="field">name</str>
      <str name="suggestFreeTextAnalyzerFieldType">suggestType</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
    </lst>
  </searchComponent>
{code}


> DocumentDictionaryFactory StackOverflowError on many documents
> --------------------------------------------------------------
>
>                 Key: SOLR-9458
>                 URL: https://issues.apache.org/jira/browse/SOLR-9458
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Suggester
>    Affects Versions: 6.1, 6.2
>            Reporter: Chris de Kok
>
> When using the FuzzyLookupFactory in combinarion with the DocumentDictionaryFactory it
will throw a stackoverflow trying to build the dictionary.
> Using the HighFrequencyDictionaryFactory works ok but behaves very different.
> ```
> <searchComponent name="suggest" class="solr.SuggestComponent">
>         <lst name="suggester">
>             <str name="name">suggest</str>
>             <str name="field">suggestions</str>
>             <str name="suggestAnalyzerFieldType">suggestions</str>
>             <str name="lookupImpl">FuzzyLookupFactory</str>
>             <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>             <str name="storeDir">suggest_fuzzy</str>
>             <str name="exactMatchFirst">true</str>
>             <str name="buildOnStartup">false</str>
>             <str name="buildOnCommit">false</str>
>             <str name="buildOnOptimize">true</str>
>             <float name="threshold">0</float>
>         </lst>
> ````
> null:java.lang.StackOverflowError
> 	at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
> 	at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
> 	at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)
> 	at org.apache.lucene.util.automaton.Operations.topoSortStatesRecurse(Operations.java:1311)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message