Return-Path: Delivered-To: apmail-lucene-tika-user-archive@www.apache.org Received: (qmail 76663 invoked from network); 1 Sep 2009 13:23:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 1 Sep 2009 13:23:27 -0000 Received: (qmail 22911 invoked by uid 500); 1 Sep 2009 13:23:27 -0000 Delivered-To: apmail-lucene-tika-user-archive@lucene.apache.org Received: (qmail 22863 invoked by uid 500); 1 Sep 2009 13:23:27 -0000 Mailing-List: contact tika-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: tika-user@lucene.apache.org Delivered-To: mailing list tika-user@lucene.apache.org Received: (qmail 22854 invoked by uid 99); 1 Sep 2009 13:23:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 13:23:27 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [208.97.132.5] (HELO spunkymail-a6.g.dreamhost.com) (208.97.132.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2009 13:23:18 +0000 Received: from [10.0.1.2] (72-254-62-133.client.stsn.net [72.254.62.133]) by spunkymail-a6.g.dreamhost.com (Postfix) with ESMTP id 6BCC1109F2B for ; Tue, 1 Sep 2009 06:22:57 -0700 (PDT) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Mime-Version: 1.0 (Apple Message framework v1075.2) Subject: Re: New user From: Grant Ingersoll In-Reply-To: <711a73df0908190431w7499ed0cu62e09d59bc65c98@mail.gmail.com> Date: Tue, 1 Sep 2009 06:22:57 -0700 Content-Transfer-Encoding: 7bit Message-Id: <58DD31E9-FFA8-406F-ADE4-0AF6A8CB133D@apache.org> References: <711a73df0908162229v125cc1d6i4c90b14657ec363@mail.gmail.com> <510143ac0908170312w38479970hacf327e178ef743c@mail.gmail.com> <711a73df0908170348o49c6d584r5df9f7db1eeb53d3@mail.gmail.com> <711a73df0908170626kce85a1bg411df25dc657b23b@mail.gmail.com> <711a73df0908190431w7499ed0cu62e09d59bc65c98@mail.gmail.com> To: tika-user@lucene.apache.org X-Mailer: Apple Mail (2.1075.2) X-Virus-Checked: Checked by ClamAV on apache.org A little late to the party, but thought I would add my two cents... On Aug 19, 2009, at 4:31 AM, Dave Pawson wrote: > > It's the search capabilities I'm most interested in, hence the > Lucene kick. Note, also that Tika is fully integrated into Solr and will be a part of the upcoming Solr 1.4 release (but you can try it now by getting the nightly). Also, I believe Solr's Data Import Handler has mechanisms for importing XML. I'd suggest looking at the Solr Wiki (http://wiki.apache.org/solr ), in particular: http://wiki.apache.org/solr/ExtractingRequestHandler http://wiki.apache.org/solr/DataImportHandler As both a Lucene and Solr committer, I think I can safely say that for most people, Solr is the place to start with Lucene, as it will save you from writing a whole lot of code and get you searching much faster and is still completely pluggable giving you near full access to Lucene. People often worry about the HTTP stuff up front, but in practice it is, in >99% of the cases a non-issue.