lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Maksakov" <br...@redlab.ru>
Subject AW: HTML tagged terms boosting...
Date Wed, 21 Jan 2004 13:35:26 GMT
Thanks for answer.

Yes I'm up to field specific boosting, but also I'm looking for creating
short descriptions on documents found, based on query (like it is done in
most search engines). I've thought about those solutions but it seemed to me
that it is not straightforward and will cause troubles when building
results' description. On second thought answer was found - analyze document
as stream and put terms into separate fields (or create duplicates) while
maintaining original offsets in Token objects.

After that building description is quite simple - just using TermPositions
in IndexReader and than getting corresponding text portion(s) from Field
(sadly it'll work only in case of one "body" field - so only duplicates are
usable, several Fields I think will require extra unindexed "body" Field to
fetch document pieces fast).

Hope I've not missed anything... Hm... not transparent, it is. :-) just hope
it helps somebody else.

> Erik Hatcher <erik@ehatchersolutions.com>
> 21.01.2004 15:27
> Please respond to "Lucene Users List"
>
>         To:     "Lucene Users List" <lucene-user@jakarta.apache.org>
>         cc:
>         Subject:        Re: HTML tagged terms boosting...
>
>
> It definitely cannot be done with custom token types.  You're probably
> aiming for field-specific boosting, so you will need to parse the HTML
> into separate fields and use a multi-field search approach.
>
> I'm sure there are other tricks that could be used for boosting, like
> inserting the words inside <b> multiple times into the same field for
> example.
>
>                  Erik
>
>
> On Jan 21, 2004, at 6:50 AM, Alexey Maksakov wrote:
>
> > Hello!
> >
> > Is there any idea how to achieve boosting terms in HTML-documents
> > surrounded
> > by HTML tags, such as <B>, <H1>, etc.?
> >
> > Can it be done with use of existing API or reimplemeting or
> > implementation
> > of TokenStream with custom Token types is needed?
> >
> > Though it seems to me, that even such re-implementation won't help
> > without
> > changing indexing and searcher code... Hope that I'm wrong.
> >
> > Thanks in advance.
> >
> > Alexey.
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message