Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 18512 invoked from network); 4 Feb 2010 09:48:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Feb 2010 09:48:18 -0000 Received: (qmail 74738 invoked by uid 500); 4 Feb 2010 09:48:16 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 74665 invoked by uid 500); 4 Feb 2010 09:48:15 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 74655 invoked by uid 99); 4 Feb 2010 09:48:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 09:48:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=SPF_HELO_PASS,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Feb 2010 09:48:08 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1NcyJP-00010Y-P6 for solr-user@lucene.apache.org; Thu, 04 Feb 2010 01:47:47 -0800 Message-ID: <27450083.post@talk.nabble.com> Date: Thu, 4 Feb 2010 01:47:47 -0800 (PST) From: dhamu To: solr-user@lucene.apache.org Subject: How to send web pages(urls) to solr cell via solrj? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: dhamu.dharan@gmail.com Hi, I am newbie to solr and exploring solr last few days. I am using solr cell with tika for parsing, indexing and searching Posting the rich text documents via Solrj. My actual requirement is instead of using local documents(pdf, doc & docx), i want to use webpages(urls for eg..,(http://www.apache.org)). eg.., req.addFile(new File("docs/mailing_lists.html")); instead req.url(new urlconnection("http://www.apache.org") anything like the above is there in solrj. Actually i am using curl for testing. it works fine curl "http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true" -F "stream.url=http://wiki.apache.org/solr/SolrConfigXml" but i am in need to use otherthan curl. Below code works fine for local document indexing and searching. But instead i want to post urls. here is my code., String url = "http://localhost:8983/solr"; SolrServer server = new CommonsHttpSolrServer(url); ContentStreamUpdateRequest req = new ContentStreamUpdateRequest( "/update/extract"); req.addFile(new File("docs/mailing_lists.html")); req.setParam("literal.id", "index1"); req.setParam("uprefix", "attr_"); req.setParam("fmap.content", "attr_content"); req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList result = server.request(req); assertNotNull("Couldn't upload index.pdf", result); QueryResponse rsp = server.query(new SolrQuery("*:*")); Assert.assertEquals(1, rsp.getResults().getNumFound()); any suggestion or answer will be appreciated. -- View this message in context: http://old.nabble.com/How-to-send-web-pages%28urls%29-to-solr-cell-via-solrj--tp27450083p27450083.html Sent from the Solr - User mailing list archive at Nabble.com.