lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Announcement: Boilerplate removal library
Date Mon, 14 Dec 2009 15:15:52 GMT
Storing the original would be an excellent idea and would be quite doable.

2009/12/14 Christian Kohlschütter <kohlschuetter@l3s.de>

> However it would also be great (in order to increase recall) to also store
> non-content and just add some kind of static boosting for content blocks
> over non-content blocks. I am not sure whether this will work right now
> using an Analyzer. What you could do though, is to store the text into
> separate fields ("content"/"boilerplate") and add field-specific boosts at
> query time.
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message