lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Extending solr analysis in index time
Date Sun, 11 Jan 2015 20:31:05 GMT
Actually, let me take that back. I seem to remember an example where
somebody used URP to do a pre-analysis of the field. That implies
access to Solr core. So it might be possible.

But I still think you need to review the business level issues, as you
are going into increasingly hacky territory.

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 11 January 2015 at 15:03, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
> No you cannot anything outside specific document being indexed at that point.
>
> What are you actually trying to achieve on the business level?
>
> Regards,
>    Alex.
> ----
> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>
>
> On 11 January 2015 at 14:59, Ali Nazemian <alinazemian@gmail.com> wrote:
>> Dear Alexandre,
>>
>> I did not tried updaterequestprocessor yet. Can I access to term
>> frequencies at this level? I dont want to calculate term frequencies once
>> more while lucene already calculate them in reverse index?
>> Thank you very much.
>>  On Jan 11, 2015 7:49 PM, "Alexandre Rafalovitch" <arafalov@gmail.com>
>> wrote:
>>
>>> Your description uses the terms Solr/Lucene uses but perhaps not in
>>> the same way we do. That might explain the confusion.
>>>
>>> It sounds - on a high level - that you want to create a field based on
>>> a combination of a couple of other fields during indexing stage. Have
>>> you tried UpdateRequestProcessors? They have access to the full
>>> document when it is sent and can do whatever they want with it.
>>>
>>> Regards,
>>>    Alex.
>>> ----
>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/
>>>
>>>
>>> On 11 January 2015 at 10:55, Ali Nazemian <alinazemian@gmail.com> wrote:
>>> > Dear Jack,
>>> > Hi,
>>> > I think you misunderstood my need. I dont want to change the default
>>> > scoring behavior of Lucene (tf-idf) I just want to have another field to
>>> do
>>> > sorting for some specific queries (not all the search business), however
>>> I
>>> > am aware of Lucene payload.
>>> > Thank you very much.
>>> >
>>> > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
>>> jack.krupansky@gmail.com>
>>> > wrote:
>>> >
>>> >> You would do that with a custom similarity (scoring) class. That's an
>>> >> expert feature. In fact a SUPER-expert feature.
>>> >>
>>> >> Start by completely familiarizing yourself with how TF*IDF  similarity
>>> >> already works:
>>> >>
>>> >>
>>> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>>> >>
>>> >> And to use your custom similarity class in Solr:
>>> >>
>>> >>
>>> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
>>> >>
>>> >>
>>> >> -- Jack Krupansky
>>> >>
>>> >> On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <alinazemian@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > Hi everybody,
>>> >> >
>>> >> > I am going to add some analysis to Solr at the index time. Here
is
>>> what I
>>> >> > am considering in my mind:
>>> >> > Suppose I have two different fields for Solr schema, field "a"
and
>>> field
>>> >> > "b". I am going to use the created reverse index in a way that
some
>>> terms
>>> >> > are considered as important ones and tell lucene to calculate a
value
>>> >> based
>>> >> > on these terms frequency per each document. For example let the
word
>>> >> > "hello" considered as important word with the weight of "2.0".
Suppose
>>> >> the
>>> >> > term frequency for this word at field "a" is 3 and at field "b"
is 6
>>> for
>>> >> > document 1. Therefor the score value would be 2*3+(2*6)^2. I want
to
>>> >> > calculate this score based on these fields and put it in the index
for
>>> >> > retrieving. My question would be how can I do such thing? First
I did
>>> >> > consider using term component for calculating this value from outside
>>> and
>>> >> > put it back to Solr index, but it seems it is not efficient enough.
>>> >> >
>>> >> > Thank you very much.
>>> >> > Best regards.
>>> >> >
>>> >> > --
>>> >> > A.Nazemian
>>> >> >
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > A.Nazemian
>>>

Mime
View raw message