manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Massiera <>
Subject Re: [VOTE] Release Apache ManifoldCF 2.11, RC3
Date Tue, 25 Sep 2018 12:13:26 GMT
This new fix seems to work. Ingestions and deletions are working and the 
image file with huge metadata is indexed !


On 25/09/2018 13:59, Karl Wright wrote:
> I've committed a hack to trunk.  It has been tested for Solr Cell
> documents, deletions, and for tika-connector-extracted documents that don't
> have a lot of metadata.  I'm asking Julien to test it with his specific
> image that has lots of metadata to see if the pathway for that case works
> properly.  If it does, I'll spin another RC.
> Long term, since I'm a Lucene/Solr committer, I think I'm going to have to
> take SolrJ under my wing if we expect it to work for ManifoldCF.  I don't
> have a lot of time to do stuff like this anymore but clearly neither does
> the Solr team.
> Karl
> On Tue, Sep 25, 2018 at 6:14 AM Karl Wright <> wrote:
>> The back-and-forth is not going well.  Mr. Noble is needing to be
>> convinced that it is a valid use case for Solr to have metadata longer than
>> 4096 characters.  In fact it seems like the Solr folks have deliberately
>> been trying to get rid of support for multipart posts for a while, because
>> they don't see the need for them.  I'm still hoping to convince them
>> otherwise but I'm not getting a positive feel.
>> I'm still trying to figure out if multipart posts have any fundamental
>> conflict with their RequestWriter architecture.  If not I can perhaps
>> override the RequestWrite implementation and add multipart support that
>> way.  But it's not going to be a quick process by any means.
>> On Mon, Sep 24, 2018 at 12:13 PM Karl Wright <> wrote:
>>> Hi Julien,
>>> This has nothing to do with the new Tika.
>>> It is not normal; it means that UpdateRequests are not being sent as
>>> multipart form posts.  It's going to require work from the Solr team to fix
>>> this problem, however, because everything I do to work around the issue
>>> nonetheless seems to fail. :-(
>>> I'm having a back-and-forth with Paul Noble right now.  I'll update
>>> accordingly when I know more.
>>> Karl
>>> On Mon, Sep 24, 2018 at 11:33 AM Julien Massiera <
>>>> wrote:
>>>> After testing it, it is a +1 for me
>>>> However, I found a new interesting issue coming with the new Tika
>>>> version. I had a jpg file for which some metadata were not extracted
>>>> before, like the RedTRC, BlueTRC and GreenTRC which contain
>>>> approximatively 2048 bytes of data each. As the metadata are passed to
>>>> Solr through the URI, I get the following error : URI is too large >8192
>>>> Do we consider it as a "normal issue" or is it worth checking the
>>>> metadata length before sending the ingest request ?
>>>> On 24/09/2018 16:43, Karl Wright wrote:
>>>>> Please vote on whether to release ManifoldCF 2.11, RC3.  This release
>>>>> contains a number of fixes/improvements/additions, described in the
>>>>> CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
>>>> number
>>>>> of fixes for classpath issues specifically requested by ManifoldCF.
>>>>> This completely fixes a SolrJ related problem with the Solr Connector
>>>> found
>>>>> in RC3.  All tests pass.
>>>>> The release artifact can be found at:
>>>>> There is also a tag at:
>>>>> Thanks again,
>>>>> Karl Wright
>>>> --
>>>> Julien MASSIERA
>>>> Directeur développement produit
>>>> France Labs – Les experts du Search
>>>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington

Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC

View raw message