lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@gmail.com>
Subject Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Date Sat, 10 Nov 2012 10:20:26 GMT
I really agree with robert here. performance is everything here and
since we have a fast variant of this query we really don't need the
slow one in core. I don't understand why expert users like you Mark B.
can't make the distinction in app code between Slow/Fast FuzzyQuery?
Even if it goes EOL and we drop it can't your app have it still, its
ASL 2.0?

I'd also vote -1 here.

simon

On Sat, Nov 10, 2012 at 2:32 AM, Robert Muir <rcmuir@gmail.com> wrote:
> Its a really simple answer.
>
> Your problem (and i quote):
> Content indexed as state:california
> But it seems like I search state:CALIFORNI~0.65  (via solr) it doesn't work.
>   I'm worried that Solr isn't running my text through the query analyzers first!
>
> This is some analysis chain configuration issue.
>
> We don't need to add support for some unscalable stuff to lucene to
> correct for that: you just need to make sure lowercasing is happening.
>
> NOTE: I will continue to protest/veto/anything i can to block queries
> with horrible complexity, making as much noise as possible, because
> the end solution is for users to index and search content correctly
> and get results in reasonable amount of time.
>
> If it doesn't work with 100M documents, i don't want it in lucene.
>
> I would have the same opinion if someone wanted unscalable solutions
> for scoring w/ language models (e.g. not happy with smoothing for
> unknown probabilities), or if someone claimed that spatial queries
> should do slow things because they don't currently support
> interplanetary distances, and so on.
>
> On Fri, Nov 9, 2012 at 7:52 PM, Mark Bennett <mbennett@ideaeng.com> wrote:
>> Hi Robert,
>>
>> I acknowledge your "-1" vote, and I'm guessing that your objection is maybe
>> 70% "scalability", and only 30% use-case?
>>
>> The older Levenstein stuff has been around for a long time, scalable or not,
>> and already in real systems.
>>
>> You seem to have a very "binary" on code being "in" or "out".  Is there any
>> room in your world-view of code for "gray code", unsupported, incubator,
>> what-have-you?  Maybe analagous to people who jailbreak their iPhones or
>> something?
>>
>> You're an important part of the community, and working at Lucid, etc., and
>> clearly concerned about software quality.  When smart folks like you have
>> such sharp opinions I do try to ponder them against my own circumstances.
>>
>> And on the quality of the old code, was it just the scalability, or were
>> there other concerns such as stability, coding style, or possibly
>> inconsistent results?
>>
>> Isn't the sandbox and admonished reference in Java docs sufficient?
>>
>> I'm harping on this because I'm really between a rock and hard place, and
>> also posted another question.
>>
>> Just trying to understand your very strong opinions, and I thank you for
>> your patience in this matter.  This issue is either going to fix or break my
>> weekend / next-deliverble.
>>
>> Sincere thanks,
>> Mark
>>
>>
>> --
>> Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
>> Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
>>
>>
>> On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>
>>> I'm -1 for having unscalable shit in lucene's core. This query should
>>> have never been added.
>>>
>>> I don't care if a few people complain because they aren't using
>>> lowercasefilter or some other insanity. Fix your analysis chain. I
>>> don't have any sympathy.
>>>
>>> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky <jack@basetechnology.com>
>>> wrote:
>>> > +1 for permitting a choice of fuzzy query implementation.
>>> >
>>> > I agree that we want a super-fast fuzzy query for simple variations, but
>>> > I
>>> > also agree that we should have the option to trade off speed for
>>> > function.
>>> >
>>> > But I am also sympathetic to assuring that any core Lucene features be
>>> > as
>>> > performant as possible.
>>> >
>>> > Ultimately, if there was a single fuzzy query implementation that did
>>> > everything for everybody all of the time, that would be the way to go,
>>> > but
>>> > if choices need to be made to satisfy competing goals, we should support
>>> > going that route.
>>> >
>>> > -- Jack Krupansky
>>> >
>>> > From: Mark Bennett
>>> > Sent: Friday, November 09, 2012 3:48 PM
>>> > To: dev@lucene.apache.org
>>> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira]
>>> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
>>> >
>>> > Hi Robert,
>>> >
>>> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>> >>
>>> >> ...
>>> >> ... I'm strongly against having this
>>> >> unscalable garbage in lucene's core.
>>> >>
>>> >> There is no use case for ed > 2, thats just crazy.
>>> >
>>> >
>>> > I promise you there ARE use cases for edit distances > 2, especially
>>> > with
>>> > longer words.  Due to NDA I can't go into details.
>>> >
>>> > Also ed>2 can be useful when COMBINING that low-quality part of the
>>> > search
>>> > with other sub-queries, or additional business rules.  Maybe instead of
>>> > boiling an ocean this lets you just boil the sea.  ;-)
>>> >
>>> > I won't comment on the quality of the older Levenstein code, or the
>>> > likely
>>> > very slow performance, nor where the code should live, etc.
>>> >
>>> > But your statement about "no use case for ed > 2" is simply not true.
>>> > (whether you'd agree with any of them or not is certainly another
>>> > matter)
>>> >
>>> > I understand your concerns about not having it be the default.  (or
>>> > maybe
>>> > having a giant warning message or something, whatever)
>>> >
>>> >> --
>>> >> lucidworks.com
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message