lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera" <ser...@gmail.com>
Subject Re: "Field weights"
Date Fri, 14 Dec 2007 18:58:21 GMT
What about boosting documents of the Brand type? You can statically boost
those documents with a log() function or something similar ...

On Dec 14, 2007 8:24 PM, Doron Cohen <cdoronc@gmail.com> wrote:

> It seems that documents having less fields satisfying
> the query worth more than those satisfying more fields
> of the query, because the first ones are more "to
> the point".
>
> At least it seems like it in the example.
>
> If this makes sense I would try to compose a top level
> boolean query out of the one-field queries, and make it use
> a similarity that implements this strange logic of coord.
> (Strange, because usually coord punishes for under
> matching, while here you want to reward for just that.)
>
> A tricky part would be to make the sub-queries use
> the "regular" coord logic, but first let's see if this is
> a valid direction at all.
>
> Doron
>
> On Dec 14, 2007 8:03 PM, Paul Elschot <paul.elschot@xs4all.nl> wrote:
>
> > Karl,
> >
> > This might work for you:
> > https://issues.apache.org/jira/browse/LUCENE-293
> >
> > Regards,
> > Paul Elschot
> >
> > On Friday 14 December 2007 18:06:01 Karl Wettin wrote:
> > > I have an index that contains three sorts of documents:
> > >
> > > Car brand
> > > Tire brand
> > > Tire pressure
> > >
> > > (Please bear with me, the real index has nothing to do with cars. I
> > > just try to explain the problem in an alternative domain to avoid NDA
> > > conflicts.)
> > >
> > > There is a heirarchial composite relationship between these sort of
> > > documents. A document describing "tire pressure" also contains "tire
> > > brand" and "car brand". A document describing "tire brand" also
> > > contains information about "car brand". A document describing "car
> > > brand" contains only that.
> > >
> > > The requirement is that the consumer of the API should not have to
> > > specify what fields they are searching in. There is no time (nor
> > > training data) to implement a hidden markov model (HMM) tokenizer or
> > > something along that path in order to extract possible attributes from
> > > the query string. Instead the query string is tokenized once per field
> > > and assebled to one huge query. Normally this works fairly well.
> > >
> > > Here are some example documents:
> > >
> > > Volvo
> > > Volvo, Michelin
> > > Volvo, Nokian
> > > Volvo, Nokian, 2.2 bars
> > > Volvo, Firestone, 2.4 bars
> > >
> > > Saab
> > > Saab, Michelin
> > > Saab, Nokian
> > > Saab, Nokian, 2.1 bars
> > > Saab, Firestone
> > > Saab, Firestone, 2.4 bars
> > > Saab, Firestone, 2.5 bars
> > >
> > > If I search for Saab the top result will be the document  representing
> > > the car brand "Saab".  The query would look like this: "car:saab
> > > tire:saab preasure:saab"
> > >
> > > But lets say Saab starts manufacturing tires too:
> > >
> > > Saab
> > > Saab, Saab tires
> > > Saab, Saab tires, 1.9 bars
> > > Saab, Saab tires, 1.8 bars
> > >
> > > If I search for "Saab" I still want the top result to be Saab the car
> > > brand. But  it not longer is, the match for "Saab, Saab tires" now
> > > have a greater score than "Saab", of course.
> > >
> > > My idea is to work along the line of indexing "Saab" in the tire brand
> > > and tire pressure field too. Now searching for Saab will yeild a
> > > result where the car brand "Saab" is the top result.
> > >
> > > However, this will not work as I have different tokenization
> > > strategies for each field (stemming and what not). Tokenizing the
> > > query string Saab for the field "tire brand" in Swedish might end up
> > > as "saa" and will thus not find the token Saab inserted for the
> > > document describing the car brand Saab.
> > >
> > > I have a couple of experiments in my head I need to try out, starting
> > > with tokezining query strings per field and using the tokens generated
> > > for the field car brand as query in the tire brand and tire pressure
> > > too. And vice versus.
> > >
> > > Any brilliant ideas that might work? Hacky solutions are OK.
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>



-- 
Regards,

Shai Erera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message