tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Cougarman <acoug...@bwc.org>
Subject RE: Return raw text from document
Date Fri, 17 Aug 2012 07:37:12 GMT
I'm using this C# code to call the parser directly via its URL; it returns JSON:

var url = @"http://localhost:8983/solr/update/extract";

var client = new WebClient();
client.QueryString.Add("extractOnly","true");
client.QueryString.Add("wt","json");
var data = client.UploadFile(url, "input.txt");
var json = ASCIIEncoding.ASCII.GetString(data);




Sincerely,
Alex 


-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: 16 August 2012 6:36 PM
To: user@tika.apache.org
Subject: Re: Return raw text from document

On Thu, 16 Aug 2012, Alexander Cougarman wrote:
> Is it possible to return just the raw text of the document extracted 
> by Tika? In other words, we don't want it in XML or JSON, just the 
> text in it.

Yes. Are you using the TikaApp jar, calling the Tika facade class, or calling a parser directly?

Nick

Mime
View raw message