lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: Solrj/Tika question about content types
Date Wed, 13 Feb 2013 19:53:28 GMT

: questions still apply: since Tika apparently cares deeply about 
: content-type now, what content-type can I supply through SolrJ to tell 
: it 'please discover the document type on your own'?  And how do I do 
: that through SolrJ?

SolrJ sets the Content-Type header based on what is returned by he 
"getContentType()" of the ContentStream -- the default behavior is 
"application/octet-stream" if getContentType() returns null.

: (1) Does the getContentType() method actually even get used on Solrj?  
: When I looked at wire logging, it seemed that Solrj just posts a generic 
: "application/xml; charset=UTF-8" content type, and does not transmit 
: anything else.  It uses standard POST, not multipart/form POST, also.

Even in the case of a single ContentStream (so no multi-part) it still 
uses ContentStream.getContentType() ... can you provide a test case (or 
quick and dirty sample code) that demonstrates what you are seeing with 
"application/xml; charset=UTF-8" getting sent over the wire even though 
you explicitly provide a diff content-type in the ContentStream?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message