lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Extending solr analysis in index time
Date Mon, 12 Jan 2015 18:56:51 GMT
Hi - You mention having a list with important terms, then using payloads would be the most
straightforward i suppose. You still need a custom similarity and custom query parser. Payloads
work for us very well.

M

 
 
-----Original message-----
> From:Ahmet Arslan <iorixxx@yahoo.com.INVALID>
> Sent: Monday 12th January 2015 19:50
> To: solr-user@lucene.apache.org
> Subject: Re: Extending solr analysis in index time
> 
> Hi Ali,
> 
> Reading your example, if you could somehow replace idf component with your "importance
weight",
> I think your use case looks like TFIDFSimilarity. Tf component remains same.
> 
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> 
> I also suggest you ask this in lucene mailing list. Someone familiar with similarity
package can give insight on this.
> 
> Ahmet
> 
> 
> 
> On Monday, January 12, 2015 6:54 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:
> Could you clarify what you mean by "Lucene reverse index"? That's not a
> term I am familiar with.
> 
> -- Jack Krupansky
> 
> 
> On Mon, Jan 12, 2015 at 1:01 AM, Ali Nazemian <alinazemian@gmail.com> wrote:
> 
> > Dear Jack,
> > Thank you very much.
> > Yeah I was thinking of function query for sorting, but I have to problems
> > in this case, 1) function query do the process at query time which I dont
> > want to. 2) I also want to have the score field for retrieving and showing
> > to users.
> >
> > Dear Alexandre,
> > Here is some more explanation about the business behind the question:
> > I am going to provide a field for each document, lets refer it as
> > "document_score". I am going to fill this field based on the information
> > that could be extracted from Lucene reverse index. Assume I have a list of
> > terms, called important terms and I am going to extract the term frequency
> > for each of the terms inside this list per each document. To be honest I
> > want to use the term frequency for calculating "document_score".
> > "document_score" should be storable since I am going to retrieve this field
> > for each document. I also want to do sorting on "document_store" in case of
> > preferred by user.
> > I hope I did convey my point.
> > Best regards.
> >
> >
> > On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky <jack.krupansky@gmail.com
> > >
> > wrote:
> >
> > > Won't function queries do the job at query time? You can add or multiply
> > > the tf*idf score by a function of the term frequency of arbitrary terms,
> > > using the tf, mul, and add functions.
> > >
> > > See:
> > > https://cwiki.apache.org/confluence/display/solr/Function+Queries
> > >
> > > -- Jack Krupansky
> > >
> > > On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian <alinazemian@gmail.com>
> > > wrote:
> > >
> > > > Dear Jack,
> > > > Hi,
> > > > I think you misunderstood my need. I dont want to change the default
> > > > scoring behavior of Lucene (tf-idf) I just want to have another field
> > to
> > > do
> > > > sorting for some specific queries (not all the search business),
> > however
> > > I
> > > > am aware of Lucene payload.
> > > > Thank you very much.
> > > >
> > > > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> > > jack.krupansky@gmail.com>
> > > > wrote:
> > > >
> > > > > You would do that with a custom similarity (scoring) class. That's
an
> > > > > expert feature. In fact a SUPER-expert feature.
> > > > >
> > > > > Start by completely familiarizing yourself with how TF*IDF
> > similarity
> > > > > already works:
> > > > >
> > > > >
> > > >
> > >
> > http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > > > >
> > > > > And to use your custom similarity class in Solr:
> > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > > > >
> > > > >
> > > > > -- Jack Krupansky
> > > > >
> > > > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <alinazemian@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi everybody,
> > > > > >
> > > > > > I am going to add some analysis to Solr at the index time. Here
is
> > > > what I
> > > > > > am considering in my mind:
> > > > > > Suppose I have two different fields for Solr schema, field "a"
and
> > > > field
> > > > > > "b". I am going to use the created reverse index in a way that
some
> > > > terms
> > > > > > are considered as important ones and tell lucene to calculate
a
> > value
> > > > > based
> > > > > > on these terms frequency per each document. For example let
the
> > word
> > > > > > "hello" considered as important word with the weight of "2.0".
> > > Suppose
> > > > > the
> > > > > > term frequency for this word at field "a" is 3 and at field
"b" is
> > 6
> > > > for
> > > > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I
want
> > to
> > > > > > calculate this score based on these fields and put it in the
index
> > > for
> > > > > > retrieving. My question would be how can I do such thing? First
I
> > did
> > > > > > consider using term component for calculating this value from
> > outside
> > > > and
> > > > > > put it back to Solr index, but it seems it is not efficient
enough.
> > > > > >
> > > > > > Thank you very much.
> > > > > > Best regards.
> > > > > >
> > > > > > --
> > > > > > A.Nazemian
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
> 

Mime
View raw message