lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: A key value field storing
Date Wed, 21 Mar 2012 16:03:45 GMT
Ah, I see.  More complicated than I realized.  How about using two
sorts of documents.

Type 1, one lucene doc for your example
 textid: 1234
 text: some text about something

Type 2, 3 lucene docs for your example
 First
  textid: 1234
  company: IBM
  score: 0.6
 Second
  textid: 1234
  company: Google
  score: 0.1
 Third
  textid: 1234
  company: Apple
  score: 0.4

You could then use the BooleanQuery approach to get textids, with an
additional lookup to get the actual text.  Not brilliant and won't
work if you want text:aaaa company:google minconf:0.1

There is BlockJoinQuery in recent versions that gives some sort of
parent/child relationship.  Might be worth a look.  Or wait for a
better idea from someone else.


--
Ian.

On Wed, Mar 21, 2012 at 3:48 PM, Deb Lucene <deb.lucene@gmail.com> wrote:
> Hi Ian,
>
> Thanks for the reply. I am not sure if the bq solution will b able to solve
> the problem. Let me explain with an example -
>
> document 1 - (some text)
> IBM - 0.6
> Google - 0.1
> Apple - 0.4
>
> Now suppose I index the document based on the "company name" and
> "confidence scores" separately and search using the bq where the Numeric
> Field search is based on "anything below 0.5" and text = "IBM". Here, by
> mistake the document 1 will be chosen (as it has been stored with 0.6, 0.1
> and 0.4). But actually it should not be - as the "IBM" score is 0.6. So in
> gist - this problem needs some sort of linking between the company name and
> the scores.
>
> --d
>
>
>
> On Wed, Mar 21, 2012 at 10:41 AM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Why do you want to link name and confidence in one field?  Store
>> confidence as a NumericField and search something like
>>
>> BooleanQuery bq = new BooleanQuery();
>> Query nameq = parser.parse(...) or whatever
>> Query confq = NumericRangeQuery.newXxx(...);
>> bq.add(nameq, ...);
>> bq,add(confq, ...);
>>
>> and search using bq.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Mar 21, 2012 at 2:20 PM, Deb Lucene <deb.lucene@gmail.com> wrote:
>> > Hi Group,
>> >
>> > Sorry for cross posting!
>> >
>> > We need to index a document corpus (news articles) with some meta data
>> > features. The meta data are actually company names with some scoring (a
>> > double, between 0 to 1). For example, two documents can be -
>> >
>> > document 1
>> > (some text - say a technical article from NY times). It comes with the
>> > metadata like -
>> > IBM - 0.5
>> > Google - 0.9
>> > Apple - 0.3
>> >
>> > where 0.5, 0.9, 0.3 are some confidence scores for the company names.
>> >
>> > Similarly, the document 2 is about some IT article and then the meta data
>> > are like -
>> > IBM - 0.6
>> > Google - 0.1
>> > Apple - 0.4
>> >
>> > now we can index the documents based on the contents or the company names
>> > easily. But here the problem is we need to create a "field" where the
>> > company names and the scores are linked. So that we can search something
>> > like -
>> >
>> > query = where the "company name" (a field) is "IBM" and the scores of IBM
>> > is > 0.5.
>> > So in that case the document 2 will be retrieved.
>> >
>> > I am wondering if anyone has ideas about using the company names and
>> scores
>> > (linked) together as a field.
>> >
>> > Thanks in advance,
>> >
>> > --d
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message