lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: format data at source or format data during indexing?
Date Thu, 30 Mar 2017 11:28:17 GMT
What's you actual business use case?

On 30 Mar 2017 1:53 AM, "Derek Poh" <dpoh@globalsources.com> wrote:

> Hi Erick
>
> So I could also not use the query analyzer stage to append the code to the
> search keyword?
> Have the front-end application append the code for every query it issue
> instead?
>
>
> On 3/30/2017 12:20 PM, Erick Erickson wrote:
>
>> I generally prefer index-time work to query-time work on the theory
>> that the index-time work is done once and the query time work is done
>> for each query.
>>
>> That said, for a corpus this size (and presumably without a large
>> query rate) I doubt you'd be able to measure any difference.
>>
>> So basically choose the easiest to implement IMO.
>>
>> Best,
>> Erick
>>
>> On Wed, Mar 29, 2017 at 8:43 PM, Alexandre Rafalovitch
>> <arafalov@gmail.com> wrote:
>>
>>> I am not sure I can tell how to decide on one or another. However, I
>>> wanted to mention that you also have an option of doing in in the
>>> UpdateRequestProcessor chain. That's still within Solr (and therefore
>>> is consistent with multiple clients feeding into Solr) but is before
>>> individual field processing (so will survive - for example - a
>>> copyField).
>>>
>>> Regards,
>>>     Alex.
>>> ----
>>> http://www.solr-start.com/ - Resources for Solr users, new and
>>> experienced
>>>
>>>
>>> On 29 March 2017 at 23:38, Derek Poh <dpoh@globalsources.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Ineed to create afield that will be prefix and suffix with code
>>>> 'z01x'.This
>>>> field needs to have the code in the index and during query.
>>>> I can either
>>>> 1.
>>>> have the source data of the field formatted with the code before
>>>> indexing
>>>> (outside solr).
>>>> use a charFilter in the query stage of the field typeto add the
>>>> codeduring
>>>> query.
>>>>
>>>> <charFilter class="solr.PatternReplaceCharFilterFactory"
>>>> pattern="^(.*)$"
>>>> replacement="z01x $1 z01x" />
>>>>
>>>> OR
>>>>
>>>> 2.
>>>> use the charFilter before tokenizerclass during the index and query
>>>> analyzer
>>>> stage of the field type.
>>>>
>>>> The collection has between 100k - 200k documents currentlybut it may
>>>> increase in the future.
>>>> Theindexing time with option 2 and current indexing time is almost the
>>>> same,
>>>> between 2-3 minutes.
>>>>
>>>> Which option would you advice?
>>>>
>>>> Derek
>>>>
>>>> ----------------------
>>>> CONFIDENTIALITY NOTICE
>>>> This e-mail (including any attachments) may contain confidential and/or
>>>> privileged information. If you are not the intended recipient or have
>>>> received this e-mail in error, please inform the sender immediately and
>>>> delete this e-mail (including any attachments) from your computer, and
>>>> you
>>>> must not use, disclose to anyone else or copy this e-mail (including any
>>>> attachments), whether in whole or in part.
>>>> This e-mail and any reply to it may be monitored for security, legal,
>>>> regulatory compliance and/or other appropriate reasons.
>>>>
>>>
>>
>
> ----------------------
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message