lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2111) Wrapup flexible indexing
Date Tue, 30 Mar 2010 19:02:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851511#action_12851511
] 

Michael McCandless commented on LUCENE-2111:
--------------------------------------------

{quote}
The term dictionary should be more "DFA-friendly", e.g. the whole concept of TermsEnum is
wrong,
linear enumeration of terms is inefficient for any big index. we should get away from it.
Instead it would be nice to think of the index like an FST, and instead of enumerating things
and filtering them,
we provide a DFA and enumerate the transduced results.
We need to eliminate the UTF-8/UTF-16 impedence mismatch which causes so much
complication and unnecessary hairy code today.
{quote}

+1 -- we already see these limitations now in making AutomatonQuery consume the straight enum.
 If we flipped the problem around (you pass a DFA to the codec and it does the intersection
& enums the result), and we used byte-based DFAs, I think we'd get a good speedup.

> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, flex_merge_916543.patch,
flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch,
LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch,
LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search performance
testing looks good, it survived several visits from the Unicode policeman ;)
> But it still has a number of nocommits, could use some more scrutiny especially on the
"emulate old API on flex index" and vice/versa code paths, and still needs some more performance
testing.  I'll do these under this issue, and we should open separate issues for other self
contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message