manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1532) Moving a file outside of the job's Paths is not the same as deleting it
Date Wed, 19 Sep 2018 18:37:03 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621019#comment-16621019
] 

Karl Wright commented on CONNECTORS-1532:
-----------------------------------------

As I suspected, there is no code difference in the framework between MODEL_ADD and MODEL_ADD_CHANGE:

{code}
./build/crawler-ui/java/org/apache/jsp/editjob_jsp.java:  int model = IRepositoryConnector.MODEL_ADD_CHANGE_DELETE;
./build/crawler-ui/java/org/apache/jsp/editjob_jsp.java:    if (model != -1 && model
!= IRepositoryConnector.MODEL_ADD_CHANGE_DELETE && model != IRepositoryConnector.MODEL_CHAINED_ADD_CHANGE_DELETE)
./build/crawler-ui/java/org/apache/jsp/viewjob_jsp.java:    if (model != -1 && model
!= IRepositoryConnector.MODEL_ADD_CHANGE_DELETE)
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:*
is the most restrictive that is still accurate.  For example, if MODEL_ADD_CHANGE_DELETE applies,
you would
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:*
return that value rather than MODEL_ADD.
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 public static final int
MODEL_ADD = 1;
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 public static final int
MODEL_ADD_CHANGE = 2;
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 public static final int
MODEL_ADD_CHANGE_DELETE = 3;
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 /** Like MODEL_ADD, except considering document discovery */
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 /** Like MODEL_ADD_CHANGE, except considering document discovery */
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/interfaces/IRepositoryConnector.java:
 /** Like MODEL_ADD_CHANGE_DELETE, except considering document discovery */
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/jobs/JobManager.java:    // (1) If
the connector has MODEL_ADD_CHANGE_DELETE, then
./pull-agent/src/main/java/org/apache/manifoldcf/crawler/jobs/JobManager.java:    if (connectorModel
== IRepositoryConnector.MODEL_ADD_CHANGE_DELETE)
{code}

I may have found the reason you see this behavior, though.  If the folder affinity is versioned
information, and I believe it is, then the seeding query will pick up the last version of
the document that was in the right folder.  That's because the seeding query uses the chronicle_id,
which is really a specific document version:

{code}
      String strDQLstart = "select for READ distinct i_chronicle_id from ";
{code}

I wouldn't know the DQL for checking to be sure that the particular version of the document
was the last one, unfortunately.


> Moving a file outside of the job's Paths is not the same as deleting it
> -----------------------------------------------------------------------
>
>                 Key: CONNECTORS-1532
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1532
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Documentum connector
>    Affects Versions: ManifoldCF 2.10
>         Environment: Manifold 2.10 patched for #1512, #1517
>            Reporter: James Thomas
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: 2018-09-19_1758.png
>
>
> If I have a MF job which is connecting a specific folder, F, in Documentum to a File
System output then:
> 1. deleting files in Documentum shows them as zero size in the file system
> 2. moving files out of F does not remove them or zero them in the file system
> Note that moving a file from another folder (which the job is not looking at) to F has
the same effect as adding it to F by e.g. importing it in DM or POSTing it to DM via the REST
interface.
> Intuitively, I expect that moving a file out of the "view" of the Documentum connector
would have the same effect on the File System as deleting it. (My model here is of MF synchronising
content between the Paths (DM) and the Output Path (File System) that I have specified in
the job.)
> Starting point, I have run the MF job to fetch a bunch of files from a folder - call
it F - in DM (i.e. I have configured Paths in the job to be F). This is what 'ls -l' on the
file system looks like:
> {code:java}
> -rw-r--r--. 1 root i2e  12541 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e     26 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e  85772 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c7
> -rw-r--r--. 1 root i2e   8790 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e 101888 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c3
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c4
> -rw-r--r--. 1 root i2e  23040 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e  26112 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7bf{code}
> In DM, I delete one of the files in F and it shows as zero size, and the modification
date has changed:
> {code:java}
> -rw-r--r--. 1 root i2e  12541 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e     26 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e   8790 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e 101888 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c3
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c4
> -rw-r--r--. 1 root i2e  23040 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e  26112 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7bf
> -rw-r--r--. 1 root i2e      0 Sep 19 07:23 drl?versionLabel=CURRENT&objectId=090000018000f7c7{code}
> In DM, I move a file from F to another folder. (Right click, add to clipboard, go to
new folder, Edit> Move here). 
> The file shows as modified (07:25), but is still apparently in F (i.e. in the Path my
MF job is looking at):
> {code:java}
> -rw-r--r--. 1 root i2e  12541 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e     26 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e   8790 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e 101888 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c3
> -rw-r--r--. 1 root i2e  23040 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e  26112 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7bf
> -rw-r--r--. 1 root i2e      0 Sep 19 07:23 drl?versionLabel=CURRENT&objectId=090000018000f7c7
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:25 drl?versionLabel=CURRENT&objectId=090000018000f7c4{code}
> In DM, I move a file from another folder to F and it shows up with the timestamp of the
move (07:28):
> {code:java}
> -rw-r--r--. 1 root i2e  12541 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e     26 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e   8790 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e 101888 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c3
> -rw-r--r--. 1 root i2e  23040 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e  26112 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7bf
> -rw-r--r--. 1 root i2e      0 Sep 19 07:23 drl?versionLabel=CURRENT&objectId=090000018000f7c7
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:25 drl?versionLabel=CURRENT&objectId=090000018000f7c4
> -rw-r--r--. 1 root i2e 191513 Sep 19 07:28 drl?versionLabel=CURRENT&objectId=09000001800045b9{code}
> But if I immediately move it out in DM then, again, the timestamp (07:30) alters but
the file apparently remains:
> {code:java}
> -rw-r--r--. 1 root i2e  12541 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e     26 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e   8790 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e 101888 Sep 19 07:21 drl?versionLabel=CURRENT&objectId=090000018000f7c3
> -rw-r--r--. 1 root i2e  23040 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e  26112 Sep 19 07:22 drl?versionLabel=CURRENT&objectId=090000018000f7bf
> -rw-r--r--. 1 root i2e      0 Sep 19 07:23 drl?versionLabel=CURRENT&objectId=090000018000f7c7
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:25 drl?versionLabel=CURRENT&objectId=090000018000f7c4
> -rw-r--r--. 1 root i2e 191513 Sep 19 07:30 drl?versionLabel=CURRENT&objectId=09000001800045b9{code}
> In DM, I now delete all visible content in F. The files that were moved out of F, and
are not visible in F in DM, remain on the file system:
> {code:java}
> -rw-r--r--. 1 root i2e      0 Sep 19 07:23 drl?versionLabel=CURRENT&objectId=090000018000f7c7
> -rw-r--r--. 1 root i2e  32783 Sep 19 07:25 drl?versionLabel=CURRENT&objectId=090000018000f7c4
> -rw-r--r--. 1 root i2e 191513 Sep 19 07:30 drl?versionLabel=CURRENT&objectId=09000001800045b9
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7c2
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7be
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7c0
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7c1
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7bf
> -rw-r--r--. 1 root i2e      0 Sep 19 07:31 drl?versionLabel=CURRENT&objectId=090000018000f7c3{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message