lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shad Storhaug <s...@shadstorhaug.com>
Subject RE: [LUCENENET-594] Stackoverflow exception
Date Tue, 15 Aug 2017 12:29:44 GMT
Oliver,

I have created a new issue so we can track this on JIRA: https://issues.apache.org/jira/projects/LUCENENET/issues/LUCENENET-594

I have confirmed the same behavior in Java Lucene with the following test:

        [Test, LuceneNetSpecific]
        public void TestLUCENENET594()
        {
            // Rather than relying on a file path somewhere, we store the
            // files zipped in an embedded resource and unzip them to a
            // known temp directory for the test.
            DirectoryInfo indexDir = CreateTempDir("test-lucenenet-594");
            using (var stream = GetType().getResourceAsStream("LUCENENET-594.zip"))
            {
                TestUtil.Unzip(stream, indexDir);
            }

            AnalyzingSuggester suggester = new AnalyzingSuggester(new GermanAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48));

            Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.Open(indexDir);
            IndexReader ir = DirectoryReader.Open(dir);
            DocumentDictionary dict = new DocumentDictionary(ir, "Content", null, null);

            IInputIterator iter = dict.GetEntryIterator();
            suggester.Build(iter); // Throws stackoverflow exception
        }

Converted to Java:

  public void testLUCENENET594() throws Exception
  {
      // Rather than relying on a file path somewhere, we store the
      // files zipped in an embedded resource and unzip them to a
      // known temp directory for the test.
      File indexDir = createTempDir("test-lucenenet-594");
      File file = new File(getClass().getResource("LUCENENET-594.zip").toURI());
      TestUtil.unzip(file, indexDir);

      AnalyzingSuggester suggester = new AnalyzingSuggester(new org.apache.lucene.analysis.de.GermanAnalyzer(org.apache.lucene.util.Version.LUCENE_48));

      org.apache.lucene.store.Directory dir = org.apache.lucene.store.FSDirectory.open(indexDir);
      org.apache.lucene.index.IndexReader ir = org.apache.lucene.index.DirectoryReader.open(dir);
      org.apache.lucene.search.suggest.DocumentDictionary dict = new org.apache.lucene.search.suggest.DocumentDictionary(ir,
"Content", null, null);

      org.apache.lucene.search.suggest.InputIterator iter = dict.getEntryIterator();
      suggester.build(iter); // Throws stackoverflow exception
  }

Both tests use the attached zip file, LUCENENET-594.zip as an embedded resource.

I can only conclude that the data in the index is invalid in some way or it is not valid to
use the result of DocumentDictionary.GetEntryIterator() in conjunction with AnalyzingSuggester.Build().

Do note that the Automaton functionality was intentionally made recursive (https://issues.apache.org/jira/browse/LUCENE-6156),
and since it is based on a regular expression, inputs that cause too many matches can overflow
the call stack.

There is some information online about how to use the AnalyzingSuggester:

http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html
http://www.programcreek.com/java-api-examples/index.php?api=org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
http://puneetkhanal.blogspot.com/2013/04/simple-auto-suggester-using-lucene-41.html

You might also try analyzing the tests to determine the correct usage (https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests.Suggest/Suggest/Analyzing/AnalyzingSuggesterTest.cs).

None of these examples use the DocumentDictionary.GetEntryIterator() in conjunction with AnalyzingSuggester.Build().
I suspect that they either weren't designed to be used together or the data in your "Content"
field isn't what is expected as valid input.

I suggest you add the details of how you are creating the index and what usage example(s)
you are following to the JIRA issue (https://issues.apache.org/jira/projects/LUCENENET/issues/LUCENENET-594)
so we can try to work out whether there is something wrong with the input data or the usage
of AnalyzingSuggester is incorrect.

Thanks,
Shad Storhaug (NightOwl888)


-----Original Message-----
From: Oliver Albrecht [mailto:o.albrecht@oliver-albrecht.de] 
Sent: Tuesday, August 8, 2017 6:29 PM
To: user@lucenenet.apache.org
Subject: RE: Stackoverflow exception

Hello Shad,

i can confirm that the bug in the CI build still exists.

The code to reproduce the issue was in my inital mail.

I think that the problem is not the code to reproduce but the index data needed.

I could build a test program and send it to you along with the index data directory. So you
could debug for yourself. 

I would send you a link to the resulting zip on my google drive to your e-mail-address. I'm
not allowed to share the data to the public.

kind regards

Oliver

> Shad Storhaug <shad@shadstorhaug.com> hat am 8. August 2017 um 11:30 geschrieben:
> 
> Oliver,
> 
> There have been a lot of bugs fixed since the last beta release. It would be useful to
know whether the bug still exists in the latest CI build so we aren't spending time working
on bugs that have already been fixed. It shouldn't take you long to swap out the packages
just to test out whether the problem is still present.
> 
> If the problem still exists in the CI build, please provide us with the minimal code
to reproduce it. A console application that reproduces the issue would be fine, but it would
be ideal if you provide a pull request on GitHub (https://github.com/apache/lucenenet/pulls)
with a test in the AnalyzingSuggesterTest class (https://github.com/apache/lucenenet/blob/master/src/Lucene.Net.Tests.Suggest/Suggest/Analyzing/AnalyzingSuggesterTest.cs)
that fails with the issue (and mark it with the [LuceneNetSpecific] attribute) so we can ensure
the bug stays fixed throughout future porting efforts. Once we have a test, we can port it
back to Java and step through to find out where the execution paths diverge. Alternatively,
if you are willing to do the work of comparing with Java to find out where the problem is,
you could submit a PR containing both the test and the fix for it.
> 
> Despite the fact there are nearly 8000 tests, there are some dusty corners that are not
covered and I think you may have stumbled upon one of them. And no, you are the first to report
this issue to us.
> 
> All I can tell you is that the download counts on the new NuGet packages such as Lucene.Net.Suggest,
Lucene.Net.Highlighter, Lucene.Net.Facet, and Lucene.Net.Spatial are much lower than I would
expect them to be for a beta, and it would be extremely helpful if there were some people
dedicated to finding bugs in these packages and reporting them to us.
> 
> Thanks,
> Shad Storhaug (NightOwl888)
> 
> -----Original Message-----
> From: Oliver Albrecht [mailto:o.albrecht@oliver-albrecht.de]
> Sent: Tuesday, August 8, 2017 3:08 PM
> To: user@lucenenet.apache.org
> Subject: RE: Stackoverflow exception
> 
> Hello Itamar, hello Shad,
> 
> thanks for the fast response.
> 
> I could you provide the callstack, but i think it's useless because it's full of calls
to IsFinite(State s, OpenBitSet path, OpenBitSet visited) because it is a recursive function.
For the same reason i think that using the CI-Build wouldn't change anything. The IsFinite-function
is still working recursively.
> 
> I've tried to replace the recursion with a stack based approach (using Stack), but i'm
not sure if my implementation is correct.
> 
> How ever, if i use my non-recursive version of IsFinite it crashes with a stackoverflow
exception in GetFiniteStrings(State s, HashSet pathstates, HashSet strings, Int32sRef path,
int limit), which is also a recursive function. But this function is to complex for me to
convert it into a non-recursive version without exact knowledge what the function should do.
> 
> In my opinion is the replacement of the recursion with a non-recursive approach the only
solution. Does no one else have this problem? I think to have an index with 4000 documents
and a size with 15 MB is not so extraordinary. Or is this only a problem how the suggester
works?
> 
> I'm just try to use lucene to build a fulltext query engine for our internal dms system.
The system holds currently 450.000 documents with ca. 50 GB of data. I think the final index
will be around 2 GB of size.
> 
> kind regards
> 
> Oliver
> 
> > Shad Storhaug <shad@shadstorhaug.com> hat am 7. August 2017 um 19:06 geschrieben:
> > 
> > Hi Oliver,
> > 
> > In addition to providing the full stack trace that Itamar mentioned, 
> > could you confirm the problem still exists if you use the latest CI 
> > build here:
> > [https://www.myget.org/gallery/lucene-net-ci?](https://www.myget.org
> > /g
> > allery/lucene-net-ci)
> > 
> > Thanks,
> > Shad Storhaug (NightOwl888)
> > 
> > -----Original Message-----
> > From: itamar.synhershko@gmail.com 
> > [mailto:itamar.synhershko@gmail.com]
> > On Behalf Of Itamar Syn-Hershko
> > Sent: Monday, August 7, 2017 9:06 PM
> > To: user@lucenenet.apache.org
> > Subject: Re: Stackoverflow exception
> > 
> > What is the full stacktrace please?
> > 
> > --
> > 
> > Itamar Syn-Hershko
> > Freelance Developer & Consultant
> > Elasticsearch Partner
> > Microsoft MVP | Lucene.NET PMC
> > http://code972.com | @synhershko <https://twitter.com/synhershko> 
> > http://BigDataBoutique.co.il/
> > 
> > On Mon, Aug 7, 2017 at 4:49 PM, Oliver Albrecht < o.albrecht@oliver-albrecht.de>
wrote:
> > 
> > > Hello,
> > > 
> > > i'm using a DocumentDictionary to feed an AnalyzingSuggester using 
> > > the following code:
> > > 
> > > SnippetAnalyzingSuggester suggester = new AnalyzingSuggester(new 
> > > GermanAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48));
> > > 
> > > Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.
> > > Open(indexDir);
> > > 
> > > IndexReader ir = DirectoryReader.Open(dir);
> > > 
> > > DocumentDictionary dict = new DocumentDictionary(ir, "Content", 
> > > null, null);
> > > 
> > > suggester.Build(dict.GetEntryIterator());
> > > 
> > > I get a stackoverflow exception on suggester.Build. The exception 
> > > throws in Lucene.Net.Util.Automaton.SpecialOperations.IsFinite.
> > > 
> > > The index contains 10.000 documents and has no payload and no weight.
> > > 
> > > Kind regards
> > > 
> > > Oliver
Mime
View raw message