lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3297) FST doesn't fully share common prefix across all outputs
Date Sat, 09 Jul 2011 13:03:16 GMT
FST doesn't fully share common prefix across all outputs
--------------------------------------------------------

                 Key: LUCENE-3297
                 URL: https://issues.apache.org/jira/browse/LUCENE-3297
             Project: Lucene - Java
          Issue Type: Improvement
            Reporter: Michael McCandless
            Priority: Minor


FST will try to share prefixes of outputs when possible, however in the [I think unusual in
practice] case where all outputs share a common prefix, FST really ought to store this just
once, on the root arc, but instead it's only able to push back to the N root arcs.  It's sort
of an off-by-one on how far back the pushing goes...

One [synthetic] example where this makes a big difference is the new Test2BPostings test,
when it uses MemoryCodec, because this test has 26 terms (letters of alphabet) and each term
has exactly the same long (~85 MB) all 1s byte[] as the postings.  If we fixed this issue,
then the resulting FST would only be ~85 MB but now instead it needs to be ~85 * 26 MB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message