Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of roman.chyla@gmail.com
 designates 209.85.216.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <2F6963EC-A87C-41C2-826A-F232B626E9F7@gmail.com>
References: <6C96541C-48BA-4B88-8326-E9687D965E6E@gmail.com>
 <CAEN8dyWhwXQeTer0WjQumfUJ6R_s=-Ha4GvHpYtPndans43T5g@mail.gmail.com>
 <CANGii8fj0==_GmJrSDOfOyWsYqsaz6aURW5PkifpaLZbtJWBJw@mail.gmail.com>
 <CAEN8dyXqUXXXwXymDiHjE4jN24XN=Wa5WyJLQDBDZDpykHf2JA@mail.gmail.com>
 <2F6963EC-A87C-41C2-826A-F232B626E9F7@gmail.com>
From: Roman Chyla <roman.chyla@gmail.com>
Date: Fri, 5 Dec 2014 14:16:46 -0500
Message-ID: 
 <CAEN8dyUhp6x-Rkc3ONg=sZ-Xev9deV1jzCp5EQ4BKUgB-aUcOQ@mail.gmail.com>
Subject: Re: Anti-Pattern in lucent-join jar?
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Content-Type: multipart/alternative; boundary=089e0129503e2cd83b05097cec76

--089e0129503e2cd83b05097cec76
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Not sure I understand. It is the searcher which executes the query, how
would you 'convince' it to pass the query? First the Weight is created,
weight instance creates scorer - you would have to change the API to do the
passing (or maybe not...?)
In my case, the relationships were across index segments, so I had to
collect them first - but in some other situations, when you look only at
the data inside one index segments, it _might_ be better to wait


On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos <darincs@gmail.com> wrote:

> Couldn=E2=80=99t you just keep passing the wrapped query and searcher dow=
n to
> Weight.scorer()?
>
> This would allow you to wait until the query is executed to do term
> collection. If you want to protect against creating and executing the que=
ry
> with different searchers, you would have to make the query factory (or
> constructor) only visible to the query parser or parser plugin?
>
> I might not have followed you, this discussing challenges my understandin=
g
> of Lucene and SOLR.
>
> Darin
>
>
>
> > On Dec 5, 2014, at 12:47 PM, Roman Chyla <roman.chyla@gmail.com> wrote:
> >
> > Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
> is
> > likely an antipattern inside a lucene component. Because custom
> components
> > may create join queries, hold to them and then execute much later
> against a
> > different searcher. One approach would be to postpone term collection
> until
> > the query actually runs, I looked far and wide for appropriate place, b=
ut
> > only found createWeight() - but at least it does give developers NO
> > opportunity to shoot their feet! ;-)
> >
> > Since it may serve as an inspiration to someone, here is a link:
> >
> https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/s=
rc/java/org/apache/lucene/search/SecondOrderQuery.java#L101
> >
> > roman
> >
> > On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev <
> mkhludnev@griddynamics.com
> >> wrote:
> >
> >> Thanks Roman! Let's expand it for the sake of completeness.
> >> Such issue is not possible in Solr, because caches are associated with
> the
> >> searcher. While you follow this design (see Solr userCache), and don't
> >> update what's cached once, there is no chance to shoot the foot.
> >> There were few caches inside of Lucene (old FieldCache,
> >> CachingWrapperFilter, ExternalFileField, etc), but they are properly
> mapped
> >> onto segment keys, hence it exclude such leakage across different
> >> searchers.
> >>
> >> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla <roman.chyla@gmail.com>
> wrote:
> >>
> >>> +1, additionally (as it follows from your observation) the query can
> get
> >>> out of sync with the index, if eg it was saved for later use and ran
> >>> against newly opened searcher
> >>>
> >>> Roman
> >>> On 4 Dec 2014 10:51, "Darin Amos" <darincs@gmail.com> wrote:
> >>>
> >>>> Hello All,
> >>>>
> >>>> I have been doing a lot of research in building some custom queries
> >> and I
> >>>> have been looking at the Lucene Join library as a reference. I notic=
ed
> >>>> something that I believe could actually have a negative side effect.
> >>>>
> >>>> Specifically I was looking at the JoinUtil.createJoinQuery(=E2=80=A6=
) method
> >> and
> >>>> within that method you see the following code:
> >>>>
> >>>>        TermsWithScoreCollector termsWithScoreCollector =3D
> >>>>            TermsWithScoreCollector.create(fromField,
> >>>> multipleValuesPerDocument, scoreMode);
> >>>>        fromSearcher.search(fromQuery, termsWithScoreCollector);
> >>>>
> >>>> As you can see, when the JoinQuery is being built, the code is
> >> executing
> >>>> the query that is wraps with it=E2=80=99s own collector to collect a=
ll the
> >>> scores.
> >>>> If I were to write a query parser using this library (which someone
> has
> >>>> done here), doesn=E2=80=99t this reduce the benefit of the SOLR quer=
y cache?
> >> The
> >>>> wrapped query is being executing when the Join Query is being
> >>> constructed,
> >>>> not when it is executed.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Darin
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> <http://www.griddynamics.com>
> >> <mkhludnev@griddynamics.com>
> >>
>
>

--089e0129503e2cd83b05097cec76--