lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: A key value field storing
Date Wed, 21 Mar 2012 16:03:36 GMT
You can use a CustomScoreQuery wrapping your scored query to multiply the
"confidence level" (as a DocValues field in Lucene trunk, or an indexed
NumericField with precisionStep=Integer.MAX_VALUE using FieldCache) into the
score.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Deb Lucene [mailto:deb.lucene@gmail.com]
> Sent: Wednesday, March 21, 2012 4:48 PM
> To: java-user@lucene.apache.org
> Subject: Re: A key value field storing
> 
> Hi Ian,
> 
> Thanks for the reply. I am not sure if the bq solution will b able to
solve the
> problem. Let me explain with an example -
> 
> document 1 - (some text)
> IBM - 0.6
> Google - 0.1
> Apple - 0.4
> 
> Now suppose I index the document based on the "company name" and
> "confidence scores" separately and search using the bq where the Numeric
> Field search is based on "anything below 0.5" and text = "IBM". Here, by
> mistake the document 1 will be chosen (as it has been stored with 0.6, 0.1
and
> 0.4). But actually it should not be - as the "IBM" score is 0.6. So in
gist - this
> problem needs some sort of linking between the company name and the
> scores.
> 
> --d
> 
> 
> 
> On Wed, Mar 21, 2012 at 10:41 AM, Ian Lea <ian.lea@gmail.com> wrote:
> 
> > Why do you want to link name and confidence in one field?  Store
> > confidence as a NumericField and search something like
> >
> > BooleanQuery bq = new BooleanQuery();
> > Query nameq = parser.parse(...) or whatever Query confq =
> > NumericRangeQuery.newXxx(...); bq.add(nameq, ...); bq,add(confq, ...);
> >
> > and search using bq.
> >
> >
> > --
> > Ian.
> >
> >
> > On Wed, Mar 21, 2012 at 2:20 PM, Deb Lucene <deb.lucene@gmail.com>
> wrote:
> > > Hi Group,
> > >
> > > Sorry for cross posting!
> > >
> > > We need to index a document corpus (news articles) with some meta
> > > data features. The meta data are actually company names with some
> > > scoring (a double, between 0 to 1). For example, two documents can
> > > be -
> > >
> > > document 1
> > > (some text - say a technical article from NY times). It comes with
> > > the metadata like - IBM - 0.5 Google - 0.9 Apple - 0.3
> > >
> > > where 0.5, 0.9, 0.3 are some confidence scores for the company names.
> > >
> > > Similarly, the document 2 is about some IT article and then the meta
> > > data are like - IBM - 0.6 Google - 0.1 Apple - 0.4
> > >
> > > now we can index the documents based on the contents or the company
> > > names easily. But here the problem is we need to create a "field"
> > > where the company names and the scores are linked. So that we can
> > > search something like -
> > >
> > > query = where the "company name" (a field) is "IBM" and the scores
> > > of IBM is > 0.5.
> > > So in that case the document 2 will be retrieved.
> > >
> > > I am wondering if anyone has ideas about using the company names and
> > scores
> > > (linked) together as a field.
> > >
> > > Thanks in advance,
> > >
> > > --d
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message