lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane James Vaucher <vauch...@cirano.qc.ca>
Subject RE: boost keywords
Date Fri, 13 Aug 2004 15:09:09 GMT
Other indexing strategies:

- AFAIK, you could probably cheat by multiplying the number of tokens in
headers thus affecting the scoring.

For example:
<h1>hello world</h1> <p> foo bar </p>
content -> hello world hello world foo bar

This is not very tweekable though.

- As Tate suggests, you can also use multiple fields and apply your search
on all of them:

<h1>hello world</h1> <p> foo bar </p>
content-> hello world foo bar
headers-> hello world

or even
<h1>hello world</h1> <h2> foo bar </h2>
content-> hello world foo bar
header1-> hello world
header2-> foo bar

The result of this is that you can fine-grained control over different
fields. At this point, you can boost at indexing or at search time. I
personnaly opt for search time because it is more open for tweeking as
oposed to reindexing everything whenever you want to change a boost
factor.

As for the complexities that Tate mentions for query parsing, he's right
that it's a pain when using the built-in query parser, but you can always
use the api directly to build whatever queries you need.

HTH,
sv

On Fri, 13 Aug 2004, Tate Avery wrote:

>
> Well, as far as I know you can boost 3 different things:
>
> - Field
> - Document
> - Query
>
> So, I think you need to craft a solution using one of those.
>
> Here are some possibilities for each:
>
> 1) Field
> 	- make a keyword field which is alongside your content field
> 	- boost your keyword field during indexing
> 	- expand user queries to search 'content' and 'keywords'
>
> 2) Document
> 	- I don't really think this one helps you in anyway
>
> 3) Query
> 	- Scan a user query and selectively boost words that are known keywords
> 	- This requires a keyword list and is not really scalable
>
> That is all that comes to mind, at first glance.  So, IMO, the winner IS #1.
>
> For example:
>
> 	Field _headline = Field.Text("headline", "...");
> 	_headline.setBoost(3);
>
> 	Field _content = Field.Text("content", "...");
>
> 	_document.addField(_headline);
> 	_document.addField(_content);
>
>
> But, the tricky part is modifying queries to use both fields.  If a user
> enters "virus", it is easy (i.e. "content:(virus) OR headline:(virus)").
> But, it quickly gets more complex with more complex queries (especially
> boolean queries with AND and such ... you probably would need something
> roughly like this:  "a AND b" = "content:(a AND b) OR headline:(a AND b)
> OR (content:a AND headline:b) OR (headline:a AND content:b) and so on).
>
> That's my 2 cents.
>
> T
>
>
>
> -----Original Message-----
> From: news [mailto:news@sea.gmane.org]On Behalf Of Leos Literak
> Sent: Friday, August 13, 2004 8:52 AM
> To: lucene-user@jakarta.apache.org
> Subject: Re: boost keywords
>
>
> Gerard Sychay napsal(a):
> > Well, there is always the Lucene wiki. There's not a patterns page per
> > se, but you could start one..
>
> of course I could. If I had something to add :-)
>
> but back to my issue. no reaction? So much people using
> Lucene and no one knows? I would be gratefull for any
> advice. Thanks
>
> Leos
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message