manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1077) Add activity logging for decision and exception events across all connectors
Date Fri, 17 Oct 2014 12:39:34 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175009#comment-14175009
] 

Karl Wright commented on CONNECTORS-1077:
-----------------------------------------

The plan is to go through our connectors one by one and identify places where we are logging
useful information, where we could also be recording an activity.

For example, for the Document Filter connector, this is the code:

{code}
    // Hard filtering (in case connectors don't call check methods above)
    SpecPacker sp = new SpecPacker(outputDescription.getSpecification());
    if (!checkURLIndexable(sp, outputDescription, documentURI, activities))
    {
      activities.noDocument();
      activities.recordActivity(null, ACTIVITY_FILTER, null, documentURI, "FILTEREDURL", "Rejected
due to URL ('"+documentURI+"')");
      if (Logging.ingest.isDebugEnabled())
        Logging.ingest.debug("Document filter: Rejected document "+documentURI+" due to URL
('"+documentURI+"')");
      return DOCUMENTSTATUS_REJECTED;
    }

    if (!checkLengthIndexable(sp, outputDescription, document.getBinaryLength(), activities))
    {
      activities.noDocument();
      activities.recordActivity(null, ACTIVITY_FILTER, null, documentURI, "FILTEREDLENGTH",
"Rejected due to length ("+document.getBinaryLength()+")");
      if (Logging.ingest.isDebugEnabled())
        Logging.ingest.debug("Document filter: Rejected document "+documentURI+" due to length
("+document.getBinaryLength()+")");
      return DOCUMENTSTATUS_REJECTED;
    }
    
    if (!checkMimeTypeIndexable(sp, outputDescription, document.getMimeType(), activities))
    {
      activities.noDocument();
      activities.recordActivity(null, ACTIVITY_FILTER, null, documentURI, "FILTEREDMIMETYPE",
"Rejected due to mime type ('"+document.getMimeType()+"')");
      if (Logging.ingest.isDebugEnabled())
        Logging.ingest.debug("Document filter: Rejected document "+documentURI+" due to mime
type ('"+document.getMimeType()+"')");
      return DOCUMENTSTATUS_REJECTED;
    }
    
    if (!checkDateIndexable(sp, outputDescription, document.getModifiedDate(), activities))
    {
      activities.noDocument();
      activities.recordActivity(null, ACTIVITY_FILTER, null, documentURI, "FILTEREDDATE",
"Rejected due to date ('"+document.getModifiedDate()+"')");
      if (Logging.ingest.isDebugEnabled())
        Logging.ingest.debug("Document filter: Rejected document "+documentURI+" due to date
('"+document.getModifiedDate()+"')");
      return DOCUMENTSTATUS_REJECTED;
    }
    
    return activities.sendDocument(documentURI, document);
{code}

Every decision that rejects the document is recorded, and a reason provided.  This needs to
be the standard for all connectors.

Priority-wise, I think we should start with output and transformation connectors, since they
are less complicated.  The Tika connector already logs the transformation part, but then can
reject a document due to length without recording anything.  I will fix that, and also the
metadata adjuster.  So I think if you select an output connector and recommend fixes for that
it would be the right place to start.

Thanks!


> Add activity logging for decision and exception events across all connectors
> ----------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1077
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1077
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Alfresco connector
>    Affects Versions: ManifoldCF 2.0
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.0
>
>
> Many document skip decisions or transient exceptions are only logged, and are not recorded
as history events.  This makes it necessary upon occasion to refer to the manifoldcf log for
basic diagnosis.  We should record activity events for most decisions and exceptions in the
history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message