manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Clegg <andrew.cl...@gmail.com>
Subject Re: Diagnosing "REJECTED" documents in job history
Date Mon, 21 Jan 2013 11:33:51 GMT
Close, it's ElasticSearch. Okay, I'll play around with these, thanks.

On 21 January 2013 11:26, Karl Wright <daddywri@gmail.com> wrote:
> Hi Andrew,
>
> The reason for rejection has to do with the criteria you provide for
> the job.  Specifically:
>
>                   if (activities.checkLengthIndexable(fileLength) &&
> activities.checkMimeTypeIndexable(contentType))
>                   {
> ...
>
> These are provided by your output connection; in there you may specify
> what mime types and what file length cutoff you want.  From the fact
> that you get these, I am guessing it's a Solr connection.  These
> criteria typically show up on tabs for the job definition.
>
> Karl
>
> On Mon, Jan 21, 2013 at 4:52 AM, Andrew Clegg <andrew.clegg@gmail.com> wrote:
>> Hi,
>>
>> I'm trying to set up a fairly simple crawl where I pull documents from
>> Documentum and push them into ElasticSearch, using the 1.0.1 binary
>> release with all appropriate extras for Documentum added.
>>
>> The repository connection looks fine -- in the job config I can see
>> the paths, document types, content types etc. as expected.
>>
>> Also the ES output connection looks fine, it reports "connection working".
>>
>> However, when I do a crawl, every document it attempts to ingest shows
>> this in the job history:
>>
>> 01-18-2013 17:36:24.279 fetch 0902620580069898 REJECTED 6264431
>>
>> (date, time, activity, identifier, result code, bytes, time)
>>
>> How can I go about diagnosing what's causing this?
>>
>> I can't see anything suspect in the ManifoldCF stdout or log, and
>> there's nothing in the Documentum server process or registry process
>> output or logs either.
>>
>> Any ideas how I'd go about diagnosing this?
>>
>> The Documentum server is on a remote machine administered by a
>> different team, that I don't have direct access to, so any tips for
>> things I could try at my end before escalating it to them would be
>> particularly useful.
>>
>> Thanks,
>>
>> Andrew.



-- 

http://tinyurl.com/andrew-clegg-linkedin | http://twitter.com/andrew_clegg

Mime
View raw message