lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teague James" <teag...@insystechinc.com>
Subject RE: Tika HTTP 400 Errors with DIH
Date Fri, 05 Dec 2014 17:03:23 GMT
Alex,

Your suggestion might be a solution, but the issue isn't that the resource isn't found. Like
Walter said 400 is a "bad request" which makes me wonder, what is the DIH/Tika doing when
trying to access the documents? What is the "request" that is bad? Is there any other way
to suss this out? Placing a network monitor in this case would be on the extreme end of difficult.

I know that the URL stored is good and that the resource exists by copying it out of a Solr
query and pasting it into the browser, so that eliminates 404 and 500 errors. Is the format
of the URL correct? Is there some other setting I've missed?

I appreciate the suggestions!

-Teague


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafalov@gmail.com] 
Sent: Thursday, December 04, 2014 12:22 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH

Right. Resource not found (on server).

The end result is the same. If it works in the browser but not from the application than either
not the same URL is being requested or - somehow - not even the same server.

The solution (watching network traffic) is still the same, right?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/
and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 4 December 2014 at 11:51, Walter Underwood <wunder@wunderwood.org> wrote:
> No, 400 should mean that the request was bad. When the server fails, that is a 500.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
>
>> 400 error means something wrong on the server (resource not found).
>> So, it would be useful to see what URL is actually being requested.
>>
>> Can you run some sort of network tracer to see the actual network 
>> request (dtrace, Wireshark, etc)? That will dissect the problem into 
>> half for you.
>>
>> Regards,
>>   Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 4 December 2014 at 09:42, Teague James <teaguej@insystechinc.com> wrote:
>>> The database stores the URL as a CLOB. Querying Solr shows that the field value
is "http://www.someaddress.com/documents/document1.docx"
>>> The URL works if I copy and paste it to the browser, but Tika gets a 400 error.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> -Teague
>>> -----Original Message-----
>>> From: Alexandre Rafalovitch [mailto:arafalov@gmail.com]
>>> Sent: Tuesday, December 02, 2014 1:45 PM
>>> To: solr-user
>>> Subject: Re: Tika HTTP 400 Errors with DIH
>>>
>>> On 2 December 2014 at 13:19, Teague James <teaguej@insystechinc.com> wrote:
>>>> clob="true"
>>>
>>> What does ClobTransformer is doing on the DownloadURL field? Is it possible it
is corrupting the value somehow?
>>>
>>> Regards,
>>>   Alex.
>>>
>>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>>
>


Mime
View raw message