Return-Path: Delivered-To: apmail-lucene-solr-commits-archive@minotaur.apache.org Received: (qmail 67069 invoked from network); 10 Sep 2009 13:11:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Sep 2009 13:11:35 -0000 Received: (qmail 27376 invoked by uid 500); 10 Sep 2009 13:11:34 -0000 Delivered-To: apmail-lucene-solr-commits-archive@lucene.apache.org Received: (qmail 27292 invoked by uid 500); 10 Sep 2009 13:11:34 -0000 Mailing-List: contact solr-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-commits@lucene.apache.org Received: (qmail 27283 invoked by uid 99); 10 Sep 2009 13:11:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 13:11:34 +0000 X-ASF-Spam-Status: No, hits=-1998.5 required=10.0 tests=ALL_TRUSTED,WEIRD_PORT X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Sep 2009 13:11:25 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id A14CA118AA for ; Thu, 10 Sep 2009 13:11:04 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: solr-commits@lucene.apache.org Date: Thu, 10 Sep 2009 13:11:04 -0000 Message-ID: <20090910131104.22015.30753@eos.apache.org> Subject: [Solr Wiki] Update of "ExtractingRequestHandler" by GrantIngersoll X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification. The following page has been changed by GrantIngersoll: http://wiki.apache.org/solr/ExtractingRequestHandler ------------------------------------------------------------------------------ = Sending documents to Solr = - // TODO: discribe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming) + // TODO: describe the different ways to send the documents to solr (POST body, form encoded, remoteStreaming) * curl http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text --data-binary @tutorial.html -H 'Content-type:text/html' NOTE, this literally streams the file, which does not, then, provide info to Solr about the name of the file. - + * SolrJ: Use the ContentStreamUpdateRequest (see SolrExampleTests.java for full example):{{{ + ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); + up.addFile(new File("mailing_lists.pdf")); + up.setParam("literal.id", "mailing_lists.pdf"); + up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); + result = server.request(up); + assertNotNull("Couldn't upload mailing_lists.pdf", result); + rsp = server.query( new SolrQuery( "*:*") ); + Assert.assertEquals( 1, rsp.getResults().getNumFound() ); + }}} == Additional Resources == * [http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#example.source Lucid Imagination article]