lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Smiley, David W." <>
Subject FST bug?
Date Tue, 17 Jul 2012 14:44:00 GMT
I am building an FST.  Here is an excerpt from my code;
    //build the FST from the workingSet
    Builder<IntsRef> builder = new Builder<IntsRef>(FST.INPUT_TYPE.BYTE4, outputs);
    IntsRef sortedKeys[] = workingSet.keySet().toArray(new IntsRef[workingSet.size()]);

    int maxPhraseLen = 0;
    int maxDocsLen = 0;
    for (IntsRef termIdsPhrase : sortedKeys) {
      IntsRef solrIds = workingSet.remove(termIdsPhrase);//remove to save memory
      assert termIdsPhrase.length > 0 && solrIds.length > 0;
      builder.add(termIdsPhrase, solrIds);

    return builder.finish();

For what it's worth, the input side is maximum 7 integers long, and the output side is typically
the same but there are a small number that get as high as 48K integers long.  There are 10M

After many calls to builder.add(), and with assertions enabled, I eventually this exception:

Exception in thread "main" java.lang.AssertionError: size must be positive (got -262796219):
likely integer overflow?
	at org.apache.lucene.util.ArrayUtil.grow(
	at org.apache.lucene.util.fst.FST.addNode(
	at org.apache.lucene.util.fst.NodeHash.add(
	at org.apache.lucene.util.fst.Builder.compileNode(
	at org.apache.lucene.util.fst.Builder.freezeTail(
	at org.apache.lucene.util.fst.Builder.add(
	at org.mitre.opensextant.solr.TaggerFstCorpus.buildPhrases(
	at org.mitre.opensextant.solr.TaggerFstCorpus.doBuild(
	at org.mitre.opensextant.solr.BuildCorpusExperiment.main(
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
	at java.lang.reflect.Method.invoke(
	at com.intellij.rt.execution.application.AppMain.main(

This is on Lucene 4.0-ALPHA using JDK 7.  I'm using 6GB of heap; my attempts to use less resulted
in Out-of-memory errors.  What FST size limitation am I bumping up against?

~ David
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message