lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: Extending solr analysis in index time
Date Tue, 13 Jan 2015 09:18:04 GMT
Dear Markus,

Unfortunately I can not use payload since I want to retrieve this score to
each user as a simple field alongside other fields. Unfortunately payload
does not provide that. Also I dont want to change the default similarity
method of Lucene, I just want to have this filed to do the sorting in some
cases.
Best regards.

On Mon, Jan 12, 2015 at 10:26 PM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:

> Hi - You mention having a list with important terms, then using payloads
> would be the most straightforward i suppose. You still need a custom
> similarity and custom query parser. Payloads work for us very well.
>
> M
>
>
>
> -----Original message-----
> > From:Ahmet Arslan <iorixxx@yahoo.com.INVALID>
> > Sent: Monday 12th January 2015 19:50
> > To: solr-user@lucene.apache.org
> > Subject: Re: Extending solr analysis in index time
> >
> > Hi Ali,
> >
> > Reading your example, if you could somehow replace idf component with
> your "importance weight",
> > I think your use case looks like TFIDFSimilarity. Tf component remains
> same.
> >
> >
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> >
> > I also suggest you ask this in lucene mailing list. Someone familiar
> with similarity package can give insight on this.
> >
> > Ahmet
> >
> >
> >
> > On Monday, January 12, 2015 6:54 PM, Jack Krupansky <
> jack.krupansky@gmail.com> wrote:
> > Could you clarify what you mean by "Lucene reverse index"? That's not a
> > term I am familiar with.
> >
> > -- Jack Krupansky
> >
> >
> > On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
> >
> > > Dear Jack,
> > > Thank you very much.
> > > Yeah I was thinking of function query for sorting, but I have to
> problems
> > > in this case, 1) function query do the process at query time which I
> dont
> > > want to. 2) I also want to have the score field for retrieving and
> showing
> > > to users.
> > >
> > > Dear Alexandre,
> > > Here is some more explanation about the business behind the question:
> > > I am going to provide a field for each document, lets refer it as
> > > "document_score". I am going to fill this field based on the
> information
> > > that could be extracted from Lucene reverse index. Assume I have a
> list of
> > > terms, called important terms and I am going to extract the term
> frequency
> > > for each of the terms inside this list per each document. To be honest
> I
> > > want to use the term frequency for calculating "document_score".
> > > "document_score" should be storable since I am going to retrieve this
> field
> > > for each document. I also want to do sorting on "document_store" in
> case of
> > > preferred by user.
> > > I hope I did convey my point.
> > > Best regards.
> > >
> > >
> > > On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky <
> jack.krupansky@gmail.com
> > > >
> > > wrote:
> > >
> > > > Won't function queries do the job at query time? You can add or
> multiply
> > > > the tf*idf score by a function of the term frequency of arbitrary
> terms,
> > > > using the tf, mul, and add functions.
> > > >
> > > > See:
> > > > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian <
> alinazemian@gmail.com>
> > > > wrote:
> > > >
> > > > > Dear Jack,
> > > > > Hi,
> > > > > I think you misunderstood my need. I dont want to change the
> default
> > > > > scoring behavior of Lucene (tf-idf) I just want to have another
> field
> > > to
> > > > do
> > > > > sorting for some specific queries (not all the search business),
> > > however
> > > > I
> > > > > am aware of Lucene payload.
> > > > > Thank you very much.
> > > > >
> > > > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > > > jack.krupansky@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > You would do that with a custom similarity (scoring) class.
> That's an
> > > > > > expert feature. In fact a SUPER-expert feature.
> > > > > >
> > > > > > Start by completely familiarizing yourself with how TF*IDF
> > > similarity
> > > > > > already works:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > > > >
> > > > > > And to use your custom similarity class in Solr:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > > > >
> > > > > >
> > > > > > -- Jack Krupansky
> > > > > >
> > > > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <
> alinazemian@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everybody,
> > > > > > >
> > > > > > > I am going to add some analysis to Solr at the index time.
> Here is
> > > > > what I
> > > > > > > am considering in my mind:
> > > > > > > Suppose I have two different fields for Solr schema, field
"a"
> and
> > > > > field
> > > > > > > "b". I am going to use the created reverse index in a way
that
> some
> > > > > terms
> > > > > > > are considered as important ones and tell lucene to calculate
a
> > > value
> > > > > > based
> > > > > > > on these terms frequency per each document. For example
let the
> > > word
> > > > > > > "hello" considered as important word with the weight of
"2.0".
> > > > Suppose
> > > > > > the
> > > > > > > term frequency for this word at field "a" is 3 and at field
> "b" is
> > > 6
> > > > > for
> > > > > > > document 1. Therefor the score value would be 2*3+(2*6)^2.
I
> want
> > > to
> > > > > > > calculate this score based on these fields and put it in
the
> index
> > > > for
> > > > > > > retrieving. My question would be how can I do such thing?
> First I
> > > did
> > > > > > > consider using term component for calculating this value
from
> > > outside
> > > > > and
> > > > > > > put it back to Solr index, but it seems it is not efficient
> enough.
> > > > > > >
> > > > > > > Thank you very much.
> > > > > > > Best regards.
> > > > > > >
> > > > > > > --
> > > > > > > A.Nazemian
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > A.Nazemian
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message