Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B12EB106D9 for ; Tue, 8 Apr 2014 10:49:46 +0000 (UTC) Received: (qmail 96411 invoked by uid 500); 8 Apr 2014 10:49:45 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 96378 invoked by uid 500); 8 Apr 2014 10:49:42 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 96369 invoked by uid 99); 8 Apr 2014 10:49:42 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 10:49:42 +0000 Received: from localhost (HELO [192.168.43.175]) (127.0.0.1) (smtp-auth username rnewson, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2014 10:49:42 +0000 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) Subject: Re: couchdb-lucene: ignore certain elements of HTML attachments From: Robert Samuel Newson In-Reply-To: Date: Tue, 8 Apr 2014 11:49:40 +0100 Content-Transfer-Encoding: 7bit Message-Id: <326A8A80-8D39-49A2-8F91-459DBE456F38@apache.org> References: To: user@couchdb.apache.org X-Mailer: Apple Mail (2.1874) Not at present but if Tika has such an option it should be easy to expose. B. On 7 Apr 2014, at 21:29, Hank Knight wrote: > Using couchdb-lucene is there a way to ignore all content inside a > blacklisted element of HTML attachments? Certain common information > is found in the header of every HTML document, including links to > other pages, and it would be ideal for these common areas not to be > searched. > >
Hello
>