Return-Path: X-Original-To: apmail-tika-user-archive@www.apache.org Delivered-To: apmail-tika-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 632EEDA82 for ; Fri, 17 Aug 2012 07:37:50 +0000 (UTC) Received: (qmail 89340 invoked by uid 500); 17 Aug 2012 07:37:50 -0000 Delivered-To: apmail-tika-user-archive@tika.apache.org Received: (qmail 88927 invoked by uid 500); 17 Aug 2012 07:37:44 -0000 Mailing-List: contact user-help@tika.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@tika.apache.org Delivered-To: mailing list user@tika.apache.org Received: (qmail 88889 invoked by uid 99); 17 Aug 2012 07:37:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 07:37:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of acougarm@bwc.org designates 192.115.146.14 as permitted sender) Received: from [192.115.146.14] (HELO CICHT02.bwc.org) (192.115.146.14) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Aug 2012 07:37:35 +0000 Received: from CSTMBX03.bwc.org ([fe80::a93a:4f9e:bade:e60d]) by CICHT02.bwc.org ([::1]) with mapi id 14.01.0355.002; Fri, 17 Aug 2012 10:37:13 +0300 From: Alexander Cougarman To: "'user@tika.apache.org'" Subject: RE: Return raw text from document Thread-Topic: Return raw text from document Thread-Index: Ac17u52AsG+OopJrQa+d2A/2eU72Sf//4BEA//7BeOA= Date: Fri, 17 Aug 2012 07:37:12 +0000 Message-ID: <62DB6BB2EC1E154C8A70F07B0F19B9AD7A43DF97@CSTMBX03.bwc.org> References: <62DB6BB2EC1E154C8A70F07B0F19B9AD7A43DD4F@CSTMBX03.bwc.org> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.2.5.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 I'm using this C# code to call the parser directly via its URL; it returns = JSON: var url =3D @"http://localhost:8983/solr/update/extract"; var client =3D new WebClient(); client.QueryString.Add("extractOnly","true"); client.QueryString.Add("wt","json"); var data =3D client.UploadFile(url, "input.txt"); var json =3D ASCIIEncoding.ASCII.GetString(data); Sincerely, Alex=20 -----Original Message----- From: Nick Burch [mailto:apache@gagravarr.org]=20 Sent: 16 August 2012 6:36 PM To: user@tika.apache.org Subject: Re: Return raw text from document On Thu, 16 Aug 2012, Alexander Cougarman wrote: > Is it possible to return just the raw text of the document extracted=20 > by Tika? In other words, we don't want it in XML or JSON, just the=20 > text in it. Yes. Are you using the TikaApp jar, calling the Tika facade class, or calli= ng a parser directly? Nick