incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <rnew...@apache.org>
Subject Re: couchdb-lucene: ignore certain elements of HTML attachments
Date Tue, 08 Apr 2014 10:49:40 GMT

Not at present but if Tika has such an option it should be easy to expose.

B.

On 7 Apr 2014, at 21:29, Hank Knight <hknight555@gmail.com> wrote:

> Using couchdb-lucene is there a way to ignore all content inside a
> blacklisted element of HTML attachments?  Certain common information
> is found in the header of every HTML document, including links to
> other pages, and it would be ideal for these common areas not to be
> searched.
> 
> <header>Hello</header>
> <div id="header">Hello</div>


Mime
View raw message