lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Kohlschütter <kohlschuet...@L3S.de>
Subject Re: Announcement: Boilerplate removal library
Date Tue, 15 Dec 2009 23:26:53 GMT
Yes, indeed.
Maybe I should come up with such an Analyzer in a boilerpipe-lucene package...

Christian

Am 14.12.2009 um 16:15 schrieb Ted Dunning:

> Storing the original would be an excellent idea and would be quite doable.
> 
> 2009/12/14 Christian Kohlschütter <kohlschuetter@l3s.de>
> 
>> However it would also be great (in order to increase recall) to also store
>> non-content and just add some kind of static boosting for content blocks
>> over non-content blocks. I am not sure whether this will work right now
>> using an Analyzer. What you could do though, is to store the text into
>> separate fields ("content"/"boilerplate") and add field-specific boosts at
>> query time.
>> 
> 
> 
> 
> -- 
> Ted Dunning, CTO
> DeepDyve

-- 
Christian Kohlschütter
kohlschuetter@L3S.de

L3S Research Center
Forschungszentrum L3S / Leibniz Universität Hannover

http://www.L3S.de/~kohlschuetter




Mime
View raw message