Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 892327327 for ; Mon, 28 Nov 2011 00:23:19 +0000 (UTC) Received: (qmail 37014 invoked by uid 500); 28 Nov 2011 00:23:19 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 36963 invoked by uid 500); 28 Nov 2011 00:23:18 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 36955 invoked by uid 99); 28 Nov 2011 00:23:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 00:23:18 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Nov 2011 00:23:11 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1RUozf-0000YE-K2 for general@lucene.apache.org; Sun, 27 Nov 2011 16:22:47 -0800 Date: Sun, 27 Nov 2011 16:22:47 -0800 (PST) From: Jan To: general@lucene.apache.org Message-ID: <1322439767608-3541066.post@n3.nabble.com> In-Reply-To: References: <1321508548770-3514857.post@n3.nabble.com> Subject: Re: Populating a custom Solr field with text extracted from document MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thank you for your reply. : what are you using to do the crawling? I'm using Solr within LucidWorks Enterprise. As far as I know LucidWorks provides a default crawler called Aperture so this is what I'm using. Thank you also for describing a few of the options to tackle the problem. I did consider writing some custom parsing code, but wanted to explore existing options first rather than re-inventing the wheel. I've tinkered with curl a bit and think that POSTing to Solr may be a suitable approach. -- View this message in context: http://lucene.472066.n3.nabble.com/Populating-a-custom-Solr-field-with-text-extracted-from-document-tp3514857p3541066.html Sent from the Lucene - General mailing list archive at Nabble.com.