lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: How to remove control characters in stored value at Solr side
Date Tue, 19 Sep 2017 22:02:42 GMT
On 9/18/2017 12:45 PM, Markus Jelsma wrote:
> But, can you then explain why Apache Nutch with SolrJ had this problem? It seems that
by default SolrJ does use XML as transport format. We have always used SolrJ which i assumed
would default to javabin, but we had this exact problem anyway, and solved it by stripping
non-character code points.
>
> When we use SolrJ for querying we clearly see wt=javabin in the logs, but updates showed
the problem. Can we fix it anywhere?

The wt parameter controls the *response*, not the *request*.

The cloud client started using javabin by default for requests in
version 4.6 (SOLR-5223), but the HTTP client used XML for requests by
default up until version 5.5 (SOLR-8595).  The current trunk Nutch code
is using SolrJ 5.4.1 and HttpSolrClient, which means that Nutch is
sending XML to Solr.  The wt parameter on those requests is javabin, so
the response that Solr sends back is binary.

SolrJ should handle translating the input so that it's valid XML, but
maybe there are characters that SolrJ's XML request writer doesn't (or
can't) handle correctly.

Thanks,
Shawn


Mime
View raw message