Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of
 SRS0=ZLAgSg=JH=basetechnology.com=jack@yourhostingaccount.com designates
 65.254.253.93 as permitted sender)
Message-ID: <99C11B58493E4B59B8A8C61774C725D5@JackKrupansky>
From: "Jack Krupansky" <jack@basetechnology.com>
To: <dev@lucene.apache.org>
References: <1574581308.74146.1347527827817.JavaMail.jiratomcat@arcas>
 <alpine.DEB.2.02.1209131522190.415@frisbee>
 <CAOdYfZVw3557kXo3FsHJcXt-xXqkW-iKW2T8F929PoAepsN5KA@mail.gmail.com>
 <CA+NxCsMSqvg-WY3o6WcjbbfYVfiG3QVPK_d0k0QEpepkHHzUyg@mail.gmail.com>
 <7C09B40B9AF7446190E18140406F076D@JackKrupansky>
 <CAOdYfZXz4wMkZawhyZ6X5QOih4odG3te-nRitXr6X5oRUKUWtQ@mail.gmail.com>
 <CA+NxCsN75KcAtER2Hktto2UkZdNGmLac-scRdyhVfnTVc7fSVw@mail.gmail.com>
 <CAOdYfZVbFTAz7uYjvu-e+okYFWJsAhuQ6dAwBKnBGrLvSEm7ZA@mail.gmail.com>
 <CA+NxCsNNzMfQe3ixuV+LcjLfzDG-0UzYWOyskJSQ8C978uVUdw@mail.gmail.com>
 <007001cdbf8a$ab0891b0$0119b510$@thetaphi.de>
In-Reply-To: <007001cdbf8a$ab0891b0$0119b510$@thetaphi.de>
Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] [Commented]
 (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
Date: Sat, 10 Nov 2012 21:14:19 -0800
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_001E_01CDBF88.59264470"
Importance: Normal
Sender: "Jack Krupansky" <jack@basetechnology.com>

------=_NextPart_000_001E_01CDBF88.59264470
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

=E2=80=9Cwe did not remove functionality=E2=80=9D

Are you saying that full-featured =E2=80=9Cclassic=E2=80=9D fuzzy query =
is still available in the Lucene query parser? By default? Or via what =
option?

-- Jack Krupansky

From: Uwe Schindler=20
Sent: Saturday, November 10, 2012 1:30 PM
To: dev@lucene.apache.org=20
Subject: RE: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] =
[Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

Just use SlowFuzzyQuery from contrib, I really don=E2=80=99t understand =
where the issue is? The code is available in the sandbox/query module =
and is available in Lucene 4.0. There is no reason to complain here, we =
did not remove functionality.

=20

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: uwe@thetaphi.de

=20

From: mbennett.ideaeng@gmail.com [mailto:mbennett.ideaeng@gmail.com] On =
Behalf Of Mark Bennett
Sent: Saturday, November 10, 2012 10:18 PM
To: dev@lucene.apache.org
Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: [jira] =
[Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.

=20

Hi guys,

Not expecting to change minds, but found Robert's last email helpful, so =
wanted to try one more round.

On Fri, Nov 9, 2012 at 5:32 PM, Robert Muir <rcmuir@gmail.com> wrote:

...
This is some analysis chain configuration issue.

=20

Interesting, so you would expect that the seed term *would* go through =
analysis before it finds the variants in the index?  If it's supposed to =
work that way then I can recheck my config.  (it wasn't just lowercase, =
that was just an example)
=20

  If it doesn't work with 100M documents, i don't want it in lucene.

=20

Ah, this is very illuminating.  For scalability, big data, etc, that =
certainly makes sense.

But there are many important Intranet search applications that have far =
less than 100M docs, but still need the fine-grained control of =
solr/lucene.  Intranet projects in the 35k to 2M doc range often have =
even more precise indexing, filtering and faceting requirements, and =
solr/lucene provides that fine blade.

Wouldn't it be more constructive to pick some number, say 100M, and give =
that the "big data" moniker.  Then, perhaps for things are not that =
scalable, have some separate area/label but still retain them.  =
Discarding all use cases < 100M seems draconian.

=20


  I would have the same opinion if someone wanted unscalable solutions
  for scoring w/ language models (e.g. not happy with smoothing for
  unknown probabilities), or if someone claimed that spatial queries
  should do slow things because they don't currently support
  interplanetary distances, and so on.


  On Fri, Nov 9, 2012 at 7:52 PM, Mark Bennett <mbennett@ideaeng.com> =
wrote:
  > Hi Robert,
  >
  > I acknowledge your "-1" vote, and I'm guessing that your objection =
is maybe
  > 70% "scalability", and only 30% use-case?
  >
  > The older Levenstein stuff has been around for a long time, scalable =
or not,
  > and already in real systems.
  >
  > You seem to have a very "binary" on code being "in" or "out".  Is =
there any
  > room in your world-view of code for "gray code", unsupported, =
incubator,
  > what-have-you?  Maybe analagous to people who jailbreak their =
iPhones or
  > something?
  >
  > You're an important part of the community, and working at Lucid, =
etc., and
  > clearly concerned about software quality.  When smart folks like you =
have
  > such sharp opinions I do try to ponder them against my own =
circumstances.
  >
  > And on the quality of the old code, was it just the scalability, or =
were
  > there other concerns such as stability, coding style, or possibly
  > inconsistent results?
  >
  > Isn't the sandbox and admonished reference in Java docs sufficient?
  >
  > I'm harping on this because I'm really between a rock and hard =
place, and
  > also posted another question.
  >
  > Just trying to understand your very strong opinions, and I thank you =
for
  > your patience in this matter.  This issue is either going to fix or =
break my
  > weekend / next-deliverble.
  >
  > Sincere thanks,
  > Mark
  >
  >
  > --
  > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
  > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
  >
  >
  > On Fri, Nov 9, 2012 at 4:37 PM, Robert Muir <rcmuir@gmail.com> =
wrote:
  >>
  >> I'm -1 for having unscalable shit in lucene's core. This query =
should
  >> have never been added.
  >>
  >> I don't care if a few people complain because they aren't using
  >> lowercasefilter or some other insanity. Fix your analysis chain. I
  >> don't have any sympathy.
  >>
  >> On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky =
<jack@basetechnology.com>
  >> wrote:
  >> > +1 for permitting a choice of fuzzy query implementation.
  >> >
  >> > I agree that we want a super-fast fuzzy query for simple =
variations, but
  >> > I
  >> > also agree that we should have the option to trade off speed for
  >> > function.
  >> >
  >> > But I am also sympathetic to assuring that any core Lucene =
features be
  >> > as
  >> > performant as possible.
  >> >
  >> > Ultimately, if there was a single fuzzy query implementation that =
did
  >> > everything for everybody all of the time, that would be the way =
to go,
  >> > but
  >> > if choices need to be made to satisfy competing goals, we should =
support
  >> > going that route.
  >> >
  >> > -- Jack Krupansky
  >> >
  >> > From: Mark Bennett
  >> > Sent: Friday, November 09, 2012 3:48 PM
  >> > To: dev@lucene.apache.org
  >> > Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: =
[jira]
  >> > [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its fast.
  >> >
  >> > Hi Robert,
  >> >
  >> > On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir <rcmuir@gmail.com> =
wrote:
  >> >>
  >> >> ...
  >> >> ... I'm strongly against having this
  >> >> unscalable garbage in lucene's core.
  >> >>
  >> >> There is no use case for ed > 2, thats just crazy.
  >> >
  >> >
  >> > I promise you there ARE use cases for edit distances > 2, =
especially
  >> > with
  >> > longer words.  Due to NDA I can't go into details.
  >> >
  >> > Also ed>2 can be useful when COMBINING that low-quality part of =
the
  >> > search
  >> > with other sub-queries, or additional business rules.  Maybe =
instead of
  >> > boiling an ocean this lets you just boil the sea.  ;-)
  >> >
  >> > I won't comment on the quality of the older Levenstein code, or =
the
  >> > likely
  >> > very slow performance, nor where the code should live, etc.
  >> >
  >> > But your statement about "no use case for ed > 2" is simply not =
true.
  >> > (whether you'd agree with any of them or not is certainly another
  >> > matter)
  >> >
  >> > I understand your concerns about not having it be the default.  =
(or
  >> > maybe
  >> > having a giant warning message or something, whatever)
  >> >
  >> >> --
  >> >> lucidworks.com
  >> >>
  >> >> =
---------------------------------------------------------------------
  >> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
  >> >> For additional commands, e-mail: dev-help@lucene.apache.org
  >> >>
  >> >
  >>
  >> =
---------------------------------------------------------------------
  >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
  >> For additional commands, e-mail: dev-help@lucene.apache.org
  >>
  >

  ---------------------------------------------------------------------
  To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
  For additional commands, e-mail: dev-help@lucene.apache.org

=20

------=_NextPart_000_001E_01CDBF88.59264470
Content-Type: text/html;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<HTML xmlns:v =3D "urn:schemas-microsoft-com:vml" xmlns:o =3D=20
"urn:schemas-microsoft-com:office:office" xmlns:w =3D=20
"urn:schemas-microsoft-com:office:word" xmlns:m =3D=20
"http://schemas.microsoft.com/office/2004/12/omml"><HEAD>
<META content=3D"text/html; charset=3Dutf-8" http-equiv=3DContent-Type>
<META name=3DGenerator content=3D"Microsoft Word 14 (filtered medium)">
<STYLE><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.E-MailFormatvorlage17
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";
	mso-fareast-language:EN-US;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
	{page:WordSection1;}
--></STYLE>
</HEAD>
<BODY dir=3Dltr lang=3DDE vLink=3Dpurple link=3Dblue>
<DIV dir=3Dltr>
<DIV style=3D"FONT-FAMILY: 'Calibri'; COLOR: #000000; FONT-SIZE: 12pt">
<DIV>=E2=80=9C<FONT style=3D"FONT-SIZE: 11pt" color=3D#1f497d>we did not =
remove=20
functionality</FONT>=E2=80=9D</DIV>
<DIV>&nbsp;</DIV>
<DIV>Are you saying that full-featured =E2=80=9Cclassic=E2=80=9D fuzzy =
query is still available=20
in the Lucene query parser? By default? Or via what option?</DIV>
<DIV style=3D"FONT-FAMILY: 'Calibri'; COLOR: #000000; FONT-SIZE: =
12pt"><BR>-- Jack=20
Krupansky</DIV>
<DIV=20
style=3D"FONT-STYLE: normal; DISPLAY: inline; FONT-FAMILY: 'Calibri'; =
COLOR: #000000; FONT-SIZE: small; FONT-WEIGHT: normal; TEXT-DECORATION: =
none">
<DIV style=3D"FONT: 10pt tahoma">
<DIV>&nbsp;</DIV>
<DIV style=3D"BACKGROUND: #f5f5f5">
<DIV style=3D"font-color: black"><B>From:</B> <A title=3Duwe@thetaphi.de =

href=3D"mailto:uwe@thetaphi.de">Uwe Schindler</A> </DIV>
<DIV><B>Sent:</B> Saturday, November 10, 2012 1:30 PM</DIV>
<DIV><B>To:</B> <A title=3Ddev@lucene.apache.org=20
href=3D"mailto:dev@lucene.apache.org">dev@lucene.apache.org</A> </DIV>
<DIV><B>Subject:</B> RE: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: =
[jira]=20
[Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its=20
fast.</DIV></DIV></DIV>
<DIV>&nbsp;</DIV></DIV>
<DIV=20
style=3D"FONT-STYLE: normal; DISPLAY: inline; FONT-FAMILY: 'Calibri'; =
COLOR: #000000; FONT-SIZE: small; FONT-WEIGHT: normal; TEXT-DECORATION: =
none">
<DIV class=3DWordSection1>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt"=20
lang=3DEN-US>Just use SlowFuzzyQuery from contrib, I really =
don=E2=80=99t understand where=20
the issue is? The code is available in the sandbox/query module and is =
available=20
in Lucene 4.0. There is no reason to complain here, we did not remove=20
functionality.<o:p></o:p></SPAN></P>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt"=20
lang=3DEN-US><o:p></o:p></SPAN>&nbsp;</P>
<DIV>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt">-----<o:p></o:p></SPAN></P>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt">Uwe=20
Schindler<o:p></o:p></SPAN></P>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt">H.-H.-Meier-Allee=20
63, D-28213 Bremen<o:p></o:p></SPAN></P>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt"><A=20
href=3D"http://www.thetaphi.de/"><SPAN=20
style=3D"COLOR: =
blue">http://www.thetaphi.de</SPAN></A><o:p></o:p></SPAN></P>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt">eMail:=20
uwe@thetaphi.de<o:p></o:p></SPAN></P></DIV>
<P class=3DMsoNormal><SPAN=20
style=3D"FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: =
11pt"><o:p></o:p></SPAN>&nbsp;</P>
<DIV=20
style=3D"BORDER-BOTTOM: medium none; BORDER-LEFT: blue 1.5pt solid; =
PADDING-BOTTOM: 0cm; PADDING-LEFT: 4pt; PADDING-RIGHT: 0cm; BORDER-TOP: =
medium none; BORDER-RIGHT: medium none; PADDING-TOP: 0cm">
<DIV>
<DIV=20
style=3D"BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; =
PADDING-BOTTOM: 0cm; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BORDER-TOP: =
#b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">
<P class=3DMsoNormal><B><SPAN=20
style=3D"FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: =
10pt">From:</SPAN></B><SPAN=20
style=3D"FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt">=20
mbennett.ideaeng@gmail.com [mailto:mbennett.ideaeng@gmail.com] <B>On =
Behalf Of=20
</B>Mark Bennett<BR><B>Sent:</B> Saturday, November 10, 2012 10:18=20
PM<BR><B>To:</B> dev@lucene.apache.org<BR><B>Subject:</B> Re: FuzzyQuery =
vs=20
SlowFuzsyQuery docs? -- was: Re: [jira] [Commented] (LUCENE-2667) Fix=20
FuzzyQuery's defaults, so its fast.<o:p></o:p></SPAN></P></DIV></DIV>
<P class=3DMsoNormal><o:p></o:p>&nbsp;</P>
<P style=3D"MARGIN-BOTTOM: 12pt" class=3DMsoNormal>Hi guys,<BR><BR>Not =
expecting to=20
change minds, but found Robert's last email helpful, so wanted to try =
one more=20
round.<o:p></o:p></P>
<DIV>
<P class=3DMsoNormal>On Fri, Nov 9, 2012 at 5:32 PM, Robert Muir &lt;<A=20
href=3D"mailto:rcmuir@gmail.com" =
target=3D_blank>rcmuir@gmail.com</A>&gt;=20
wrote:<o:p></o:p></P>
<P class=3DMsoNormal>...<BR>This is some analysis chain configuration=20
issue.<o:p></o:p></P>
<DIV>
<P class=3DMsoNormal>&nbsp;<o:p></o:p></P></DIV>
<DIV>
<P class=3DMsoNormal>Interesting, so you would expect that the seed term =
*would*=20
go through analysis before it finds the variants in the index?&nbsp; If =
it's=20
supposed to work that way then I can recheck my config.&nbsp; (it wasn't =
just=20
lowercase, that was just an example)<BR>&nbsp;<o:p></o:p></P></DIV>
<BLOCKQUOTE=20
style=3D"BORDER-BOTTOM: medium none; BORDER-LEFT: #cccccc 1pt solid; =
PADDING-BOTTOM: 0cm; PADDING-LEFT: 6pt; PADDING-RIGHT: 0cm; MARGIN-LEFT: =
4.8pt; BORDER-TOP: medium none; MARGIN-RIGHT: 0cm; BORDER-RIGHT: medium =
none; PADDING-TOP: 0cm">
  <P class=3DMsoNormal>If it doesn't work with 100M documents, i don't =
want it in=20
  lucene.<o:p></o:p></P></BLOCKQUOTE>
<DIV>
<P class=3DMsoNormal>&nbsp;<o:p></o:p></P></DIV>
<DIV>
<P class=3DMsoNormal>Ah, this is very illuminating.&nbsp; For =
scalability, big=20
data, etc, that certainly makes sense.<BR><BR>But there are many =
important=20
Intranet search applications that have far less than 100M docs, but =
still need=20
the fine-grained control of solr/lucene.&nbsp; Intranet projects in the =
35k to=20
2M doc range often have even more precise indexing, filtering and =
faceting=20
requirements, and solr/lucene provides that fine blade.<BR><BR>Wouldn't =
it be=20
more constructive to pick some number, say 100M, and give that the "big =
data"=20
moniker.&nbsp; Then, perhaps for things are not that scalable, have some =

separate area/label but still retain them.&nbsp; Discarding all use =
cases &lt;=20
100M seems draconian.<BR><BR>&nbsp;<o:p></o:p></P></DIV>
<BLOCKQUOTE=20
style=3D"BORDER-BOTTOM: medium none; BORDER-LEFT: #cccccc 1pt solid; =
PADDING-BOTTOM: 0cm; PADDING-LEFT: 6pt; PADDING-RIGHT: 0cm; MARGIN-LEFT: =
4.8pt; BORDER-TOP: medium none; MARGIN-RIGHT: 0cm; BORDER-RIGHT: medium =
none; PADDING-TOP: 0cm">
  <P class=3DMsoNormal><BR>I would have the same opinion if someone =
wanted=20
  unscalable solutions<BR>for scoring w/ language models (e.g. not happy =
with=20
  smoothing for<BR>unknown probabilities), or if someone claimed that =
spatial=20
  queries<BR>should do slow things because they don't currently=20
  support<BR>interplanetary distances, and so on.<o:p></o:p></P>
  <DIV>
  <DIV>
  <P style=3D"MARGIN-BOTTOM: 12pt" class=3DMsoNormal><BR>On Fri, Nov 9, =
2012 at 7:52=20
  PM, Mark Bennett &lt;<A=20
  href=3D"mailto:mbennett@ideaeng.com">mbennett@ideaeng.com</A>&gt; =
wrote:<BR>&gt;=20
  Hi Robert,<BR>&gt;<BR>&gt; I acknowledge your "-1" vote, and I'm =
guessing that=20
  your objection is maybe<BR>&gt; 70% "scalability", and only 30%=20
  use-case?<BR>&gt;<BR>&gt; The older Levenstein stuff has been around =
for a=20
  long time, scalable or not,<BR>&gt; and already in real=20
  systems.<BR>&gt;<BR>&gt; You seem to have a very "binary" on code =
being "in"=20
  or "out".&nbsp; Is there any<BR>&gt; room in your world-view of code =
for "gray=20
  code", unsupported, incubator,<BR>&gt; what-have-you?&nbsp; Maybe =
analagous to=20
  people who jailbreak their iPhones or<BR>&gt; =
something?<BR>&gt;<BR>&gt;=20
  You're an important part of the community, and working at Lucid, etc., =

  and<BR>&gt; clearly concerned about software quality.&nbsp; When smart =
folks=20
  like you have<BR>&gt; such sharp opinions I do try to ponder them =
against my=20
  own circumstances.<BR>&gt;<BR>&gt; And on the quality of the old code, =
was it=20
  just the scalability, or were<BR>&gt; there other concerns such as =
stability,=20
  coding style, or possibly<BR>&gt; inconsistent =
results?<BR>&gt;<BR>&gt; Isn't=20
  the sandbox and admonished reference in Java docs =
sufficient?<BR>&gt;<BR>&gt;=20
  I'm harping on this because I'm really between a rock and hard place,=20
  and<BR>&gt; also posted another question.<BR>&gt;<BR>&gt; Just trying =
to=20
  understand your very strong opinions, and I thank you for<BR>&gt; your =

  patience in this matter.&nbsp; This issue is either going to fix or =
break=20
  my<BR>&gt; weekend / next-deliverble.<BR>&gt;<BR>&gt; Sincere =
thanks,<BR>&gt;=20
  Mark<BR>&gt;<BR>&gt;<BR>&gt; --<BR>&gt; Mark Bennett / New Idea =
Engineering,=20
  Inc. / <A =
href=3D"mailto:mbennett@ideaeng.com">mbennett@ideaeng.com</A><BR>&gt;=20
  Direct: <A href=3D"tel:408-733-0387">408-733-0387</A> / Main: =
866-IDEA-ENG /=20
  Cell: <A =
href=3D"tel:408-829-6513">408-829-6513</A><BR>&gt;<BR>&gt;<BR>&gt; On=20
  Fri, Nov 9, 2012 at 4:37 PM, Robert Muir &lt;<A=20
  href=3D"mailto:rcmuir@gmail.com">rcmuir@gmail.com</A>&gt;=20
  wrote:<BR>&gt;&gt;<BR>&gt;&gt; I'm -1 for having unscalable shit in =
lucene's=20
  core. This query should<BR>&gt;&gt; have never been=20
  added.<BR>&gt;&gt;<BR>&gt;&gt; I don't care if a few people complain =
because=20
  they aren't using<BR>&gt;&gt; lowercasefilter or some other insanity. =
Fix your=20
  analysis chain. I<BR>&gt;&gt; don't have any =
sympathy.<BR>&gt;&gt;<BR>&gt;&gt;=20
  On Fri, Nov 9, 2012 at 7:35 PM, Jack Krupansky &lt;<A=20
  =
href=3D"mailto:jack@basetechnology.com">jack@basetechnology.com</A>&gt;<B=
R>&gt;&gt;=20
  wrote:<BR>&gt;&gt; &gt; +1 for permitting a choice of fuzzy query=20
  implementation.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; I agree that we want =
a=20
  super-fast fuzzy query for simple variations, but<BR>&gt;&gt; &gt;=20
  I<BR>&gt;&gt; &gt; also agree that we should have the option to trade =
off=20
  speed for<BR>&gt;&gt; &gt; function.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; =
But I=20
  am also sympathetic to assuring that any core Lucene features =
be<BR>&gt;&gt;=20
  &gt; as<BR>&gt;&gt; &gt; performant as possible.<BR>&gt;&gt; =
&gt;<BR>&gt;&gt;=20
  &gt; Ultimately, if there was a single fuzzy query implementation that =

  did<BR>&gt;&gt; &gt; everything for everybody all of the time, that =
would be=20
  the way to go,<BR>&gt;&gt; &gt; but<BR>&gt;&gt; &gt; if choices need =
to be=20
  made to satisfy competing goals, we should support<BR>&gt;&gt; &gt; =
going that=20
  route.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; -- Jack Krupansky<BR>&gt;&gt; =

  &gt;<BR>&gt;&gt; &gt; From: Mark Bennett<BR>&gt;&gt; &gt; Sent: =
Friday,=20
  November 09, 2012 3:48 PM<BR>&gt;&gt; &gt; To: <A=20
  =
href=3D"mailto:dev@lucene.apache.org">dev@lucene.apache.org</A><BR>&gt;&g=
t; &gt;=20
  Subject: Re: FuzzyQuery vs SlowFuzsyQuery docs? -- was: Re: =
[jira]<BR>&gt;&gt;=20
  &gt; [Commented] (LUCENE-2667) Fix FuzzyQuery's defaults, so its=20
  fast.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; Hi Robert,<BR>&gt;&gt;=20
  &gt;<BR>&gt;&gt; &gt; On Thu, Sep 13, 2012 at 7:39 PM, Robert Muir =
&lt;<A=20
  href=3D"mailto:rcmuir@gmail.com">rcmuir@gmail.com</A>&gt; =
wrote:<BR>&gt;&gt;=20
  &gt;&gt;<BR>&gt;&gt; &gt;&gt; ...<BR>&gt;&gt; &gt;&gt; ... I'm =
strongly=20
  against having this<BR>&gt;&gt; &gt;&gt; unscalable garbage in =
lucene's=20
  core.<BR>&gt;&gt; &gt;&gt;<BR>&gt;&gt; &gt;&gt; There is no use case =
for ed=20
  &gt; 2, thats just crazy.<BR>&gt;&gt; &gt;<BR>&gt;&gt; =
&gt;<BR>&gt;&gt; &gt; I=20
  promise you there ARE use cases for edit distances &gt; 2,=20
  especially<BR>&gt;&gt; &gt; with<BR>&gt;&gt; &gt; longer words.&nbsp; =
Due to=20
  NDA I can't go into details.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; Also =
ed&gt;2=20
  can be useful when COMBINING that low-quality part of the<BR>&gt;&gt; =
&gt;=20
  search<BR>&gt;&gt; &gt; with other sub-queries, or additional business =

  rules.&nbsp; Maybe instead of<BR>&gt;&gt; &gt; boiling an ocean this =
lets you=20
  just boil the sea.&nbsp; ;-)<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; I won't =
comment=20
  on the quality of the older Levenstein code, or the<BR>&gt;&gt; &gt;=20
  likely<BR>&gt;&gt; &gt; very slow performance, nor where the code =
should live,=20
  etc.<BR>&gt;&gt; &gt;<BR>&gt;&gt; &gt; But your statement about "no =
use case=20
  for ed &gt; 2" is simply not true.<BR>&gt;&gt; &gt; (whether you'd =
agree with=20
  any of them or not is certainly another<BR>&gt;&gt; &gt; =
matter)<BR>&gt;&gt;=20
  &gt;<BR>&gt;&gt; &gt; I understand your concerns about not having it =
be the=20
  default.&nbsp; (or<BR>&gt;&gt; &gt; maybe<BR>&gt;&gt; &gt; having a =
giant=20
  warning message or something, whatever)<BR>&gt;&gt; &gt;<BR>&gt;&gt; =
&gt;&gt;=20
  --<BR>&gt;&gt; &gt;&gt; <A href=3D"http://lucidworks.com"=20
  target=3D_blank>lucidworks.com</A><BR>&gt;&gt; &gt;&gt;<BR>&gt;&gt; =
&gt;&gt;=20
  =
---------------------------------------------------------------------<BR>=
&gt;&gt;=20
  &gt;&gt; To unsubscribe, e-mail: <A=20
  =
href=3D"mailto:dev-unsubscribe@lucene.apache.org">dev-unsubscribe@lucene.=
apache.org</A><BR>&gt;&gt;=20
  &gt;&gt; For additional commands, e-mail: <A=20
  =
href=3D"mailto:dev-help@lucene.apache.org">dev-help@lucene.apache.org</A>=
<BR>&gt;&gt;=20
  &gt;&gt;<BR>&gt;&gt; &gt;<BR>&gt;&gt;<BR>&gt;&gt;=20
  =
---------------------------------------------------------------------<BR>=
&gt;&gt;=20
  To unsubscribe, e-mail: <A=20
  =
href=3D"mailto:dev-unsubscribe@lucene.apache.org">dev-unsubscribe@lucene.=
apache.org</A><BR>&gt;&gt;=20
  For additional commands, e-mail: <A=20
  =
href=3D"mailto:dev-help@lucene.apache.org">dev-help@lucene.apache.org</A>=
<BR>&gt;&gt;<BR>&gt;<BR><BR>---------------------------------------------=
------------------------<BR>To=20
  unsubscribe, e-mail: <A=20
  =
href=3D"mailto:dev-unsubscribe@lucene.apache.org">dev-unsubscribe@lucene.=
apache.org</A><BR>For=20
  additional commands, e-mail: <A=20
  =
href=3D"mailto:dev-help@lucene.apache.org">dev-help@lucene.apache.org</A>=
<o:p></o:p></P></DIV></DIV></BLOCKQUOTE></DIV>
<P=20
class=3DMsoNormal><o:p></o:p>&nbsp;</P></DIV></DIV></DIV></DIV></DIV></BO=
DY></HTML>

------=_NextPart_000_001E_01CDBF88.59264470--