couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Samuel Newson <>
Subject Re: couchdb-lucene: ignore certain elements of HTML attachments
Date Tue, 08 Apr 2014 10:49:40 GMT

Not at present but if Tika has such an option it should be easy to expose.


On 7 Apr 2014, at 21:29, Hank Knight <> wrote:

> Using couchdb-lucene is there a way to ignore all content inside a
> blacklisted element of HTML attachments?  Certain common information
> is found in the header of every HTML document, including links to
> other pages, and it would be ideal for these common areas not to be
> searched.
> <header>Hello</header>
> <div id="header">Hello</div>

View raw message