lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deb Lucene <deb.luc...@gmail.com>
Subject Re: A key value field storing
Date Wed, 21 Mar 2012 15:48:23 GMT
Hi Ian,

Thanks for the reply. I am not sure if the bq solution will b able to solve
the problem. Let me explain with an example -

document 1 - (some text)
IBM - 0.6
Google - 0.1
Apple - 0.4

Now suppose I index the document based on the "company name" and
"confidence scores" separately and search using the bq where the Numeric
Field search is based on "anything below 0.5" and text = "IBM". Here, by
mistake the document 1 will be chosen (as it has been stored with 0.6, 0.1
and 0.4). But actually it should not be - as the "IBM" score is 0.6. So in
gist - this problem needs some sort of linking between the company name and
the scores.

--d



On Wed, Mar 21, 2012 at 10:41 AM, Ian Lea <ian.lea@gmail.com> wrote:

> Why do you want to link name and confidence in one field?  Store
> confidence as a NumericField and search something like
>
> BooleanQuery bq = new BooleanQuery();
> Query nameq = parser.parse(...) or whatever
> Query confq = NumericRangeQuery.newXxx(...);
> bq.add(nameq, ...);
> bq,add(confq, ...);
>
> and search using bq.
>
>
> --
> Ian.
>
>
> On Wed, Mar 21, 2012 at 2:20 PM, Deb Lucene <deb.lucene@gmail.com> wrote:
> > Hi Group,
> >
> > Sorry for cross posting!
> >
> > We need to index a document corpus (news articles) with some meta data
> > features. The meta data are actually company names with some scoring (a
> > double, between 0 to 1). For example, two documents can be -
> >
> > document 1
> > (some text - say a technical article from NY times). It comes with the
> > metadata like -
> > IBM - 0.5
> > Google - 0.9
> > Apple - 0.3
> >
> > where 0.5, 0.9, 0.3 are some confidence scores for the company names.
> >
> > Similarly, the document 2 is about some IT article and then the meta data
> > are like -
> > IBM - 0.6
> > Google - 0.1
> > Apple - 0.4
> >
> > now we can index the documents based on the contents or the company names
> > easily. But here the problem is we need to create a "field" where the
> > company names and the scores are linked. So that we can search something
> > like -
> >
> > query = where the "company name" (a field) is "IBM" and the scores of IBM
> > is > 0.5.
> > So in that case the document 2 will be retrieved.
> >
> > I am wondering if anyone has ideas about using the company names and
> scores
> > (linked) together as a field.
> >
> > Thanks in advance,
> >
> > --d
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message