Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of linlma@gmail.com designates
 209.85.216.43 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAD0cWeVq=HeokrL3+dtp-gaJKGpcQPh_UH8F31XnooTd=BeUOg@mail.gmail.com>
References: 
 <CAK_MoSuh5YZmKamZ8wyLFuCVfcgCNCfvfC21oP+vQDoJ5VpgZw@mail.gmail.com>
	<CAD0cWeU5T0a0rcme-Cy5CO6CU-KDy3qGpPyDESc6SM8xYTxVWA@mail.gmail.com>
	<CAK_MoSscXccYtnKaTPjkLfjrM7ysDhH+GoOtVQX7NWuR8ddRag@mail.gmail.com>
	<CAD0cWeVp=GmCvc3ZUSgTg__uGWdjEo1CxQ6bPr3EXHc4A-e1sg@mail.gmail.com>
	<CAK_MoSuap3YV1=mku6r6Zky2C_9fkR9vRze205VhmspLcnKb=A@mail.gmail.com>
	<CAD0cWeVq=HeokrL3+dtp-gaJKGpcQPh_UH8F31XnooTd=BeUOg@mail.gmail.com>
Date: Sun, 17 Mar 2013 10:35:59 +0800
Message-ID: 
 <CAK_MoSsEySx5AZ=PAuHVkOcJ4h36KqeN1GcDN1ncABONgMhDLQ@mail.gmail.com>
Subject: Re: potential query performance issue
From: Lin Ma <linlma@gmail.com>
To: lukai <lukai1984@gmail.com>, java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=485b397dd3b59720b804d815bb25

--485b397dd3b59720b804d815bb25
Content-Type: text/plain; charset=ISO-8859-1

Thanks Lukai for the detailed reply,

   - "If you query is too long, it might not very efficient in query
   evaluation process. " -- how does Lucene query evaluation works? Is there
   any document to refer to?
   - "you can read out payload of the match term you have stored" -- what
   do you mean payload of the match term? Could you show me an example?

regards,
Lin

On Sun, Mar 17, 2013 at 7:13 AM, lukai <lukai1984@gmail.com> wrote:

>
>
> On Fri, Mar 15, 2013 at 10:02 PM, Lin Ma <linlma@gmail.com> wrote:
>
>> Hi Lukai, thanks for the detailed reply.
>>
>> Some more comments,
>>
>>    - "You can try score by payload" -- what do you mean score by
>>    payload? Appreciate if you could provide a bit more details;
>>
>>      Write your own query/scorer, you can read out payload of the match
> term you have stored. You can implement your dot product functionality in
> score function of your scorer.
>
>>
>>    - "Lucene focus on search for the default implementation" -- for
>>    default you mean?
>>
>>     I mean the default query parser, query types are designed for search
> application. If you query is too long, it might not very efficient in query
> evaluation process.
>
>>
>>    - "For your requirement, you can do some query re-write process to
>>    reduce your query size" -- I think query re-write you mean rewrite "iPhone
>>    5", "iPhone 4S" to "iPhone" to reduce # of queries? Or you mean something
>>    else?
>>
>>    Query re-write, it really depends on your application. you can
> reduce/expand your query or even change the query type according your
> needs.
>
>>
>>    -
>>
>> regards,
>> Lin
>>
>>
>> On Sat, Mar 16, 2013 at 11:55 AM, lukai <lukai1984@gmail.com> wrote:
>>
>>> Different application has different requirement and resolve different
>>> problem. Lucene focus on search for the default implementation. For your
>>> requirement, you can do some query re-write process to reduce your query
>>> size if you still want to leverage the search functionality. If you just
>>> want to customize your feature value and do simple dot product calculation.
>>> You can try score by payload, it might not very efficient, cuz you still
>>> need to convert your query into some specified Lucene query type. But you
>>> still can leverage the existing index structure, NRT, distributed search
>>> support by Solr.
>>>
>>> When you refer to performance, it really depends on the document size,
>>> term distribution of your corpus. If you have enough machine, you can just
>>> try reduce document number per instance and distribute your search to
>>> achieve a better performance goal.
>>>
>>>
>>>
>>>
>>> On Fri, Mar 15, 2013 at 7:36 PM, Lin Ma <linlma@gmail.com> wrote:
>>>
>>>> Hi lukai, thanks for the reply. Do you mean WAND is a way to resolve
>>>> this issue? For "native support", do you mean there is no built-in
>>>> (existing ready to use externally open source) module in Lucene to
>>>> implement WAND? If so, the performance will really be bad.
>>>>
>>>> regards,
>>>> Lin
>>>>
>>>>
>>>> On Sat, Mar 16, 2013 at 2:49 AM, lukai <lukai1984@gmail.com> wrote:
>>>>
>>>>> I had implemented wand with solr/lucene. So far there is no performance
>>>>> issue.  There is no native support for this functionality, you need to
>>>>> implement it by yourself..
>>>>>
>>>>> On Fri, Mar 15, 2013 at 10:09 AM, Lin Ma <linlma@gmail.com> wrote:
>>>>>
>>>>> > Hello guys,
>>>>> >
>>>>> > Supposing I have one million documents, and each document has
>>>>> hundreds of
>>>>> > features. For a given query, it also has hundreds of features. I
>>>>> want to
>>>>> > fetch most relevant top 1000 documents by dot product related
>>>>> features of
>>>>> > query and documents (query/document features are in the same feature
>>>>> > space).
>>>>> >
>>>>> > I am not sure how Lucene implement internally? If we have to go
>>>>> through all
>>>>> > one million document to dot product the query, then I am concerning
>>>>> about
>>>>> > the performance. Appreciate if anyone could confirm (1) how Lucene
>>>>> works
>>>>> > internally for this use case (2) any smart ideas to make improvement
>>>>> for
>>>>> > query efficiency to select top 1000 documents?
>>>>> >
>>>>> > thanks in advance,
>>>>> > Lin
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

--485b397dd3b59720b804d815bb25--