lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2111) Wrapup flexible indexing
Date Tue, 30 Mar 2010 16:33:33 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851456#action_12851456
] 

Robert Muir commented on LUCENE-2111:
-------------------------------------

{quote}
There are certain specific wildcard corner cases where we are
slower, but these are likely rarely used in practice (many ?'s
followed by a suffix).
{quote}

I think it would be good to fix this in the future, but I certainly think its a rare case.
The problem is similar to where an SQL engine decides to just table-scan instead
of using a btree index... In this case we are trying to be too smart and just seek
to the correct term based on the query instead of scanning, but this causes too
many seeks.

At the same time, you have to be careful or you make the wrong decision
and give O\(n\) performance instead of O\(log n\). 

In my opinion it would be better to think in the future how we can improve lucene
in the following ways:
* The term dictionary should be more "DFA-friendly", e.g. the whole concept of TermsEnum is
wrong, 
linear enumeration of terms is inefficient for any big index. we should get away from it.
* Instead it would be nice to think of the index like an FST, and instead of enumerating things
and filtering them,
we provide a DFA and enumerate the transduced results.
* We need to eliminate the UTF-8/UTF-16 impedence mismatch which causes so much
complication and unnecessary hairy code today.

All this being said, I think flex is a great move forward for multitermqueries, at least
we have a seeking-friendly API! One step at a time.



> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, flex_merge_916543.patch,
flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch,
LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch,
LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch,
LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search performance
testing looks good, it survived several visits from the Unicode policeman ;)
> But it still has a number of nocommits, could use some more scrutiny especially on the
"emulate old API on flex index" and vice/versa code paths, and still needs some more performance
testing.  I'll do these under this issue, and we should open separate issues for other self
contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message