manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From msaunier <msaun...@citya.com>
Subject RE: Modify job to add excludes files and directory
Date Wed, 14 Mar 2018 11:04:51 GMT
Ok. It work with the good comportment.

 

Thanks Karl,

Maxence

 

De : msaunier [mailto:msaunier@citya.com] 
Envoyé : mercredi 14 mars 2018 10:58
À : user@manifoldcf.apache.org
Cc : fharrang@citya.com
Objet : RE: Modify job to add excludes files and directory

 

I have modify my script to optain the result on screen, join at this mail :



 

 

I try and recontact you.

 

Thanks,

Maxence 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 13 mars 2018 21:57
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Objet : Re: Modify job to add excludes files and directory

 

I created a ticket (CONNECTORS-1499) and attached a patch that uses the more detailed format
in all situations where hash order could affect things.  If you apply the patch, you should
definitely see a difference in the JSON output when you dump a job in JSON format.  You will
still need to learn to use the order-preserving format when generating your own JSON.

 

Thanks,

Karl

 

 

On Tue, Mar 13, 2018 at 4:33 PM, Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

Right, so the new org.simple.json JSON parser uses hash order for keys.  That scrambles their
order on reading.  So unless you intermingle includes and excludes within a start point, you
are currently at risk of getting the order switched on you.

 

There's a clean-room implementation of the old JSON parser available now; I'll have to look
into going back to it.  But for now I'm going to change how output is done so that it only
uses arrays if there's a single child type possible.

 

Karl

 

 

On Tue, Mar 13, 2018 at 4:19 PM, Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

The code has two ways of representing the same thing in JSON.  One way collapses similar child
types into arrays.  The other way (which is used when it's determined that the first way won't
maintain order) is quite different.  Please see the following code:

 

>>>>>> 

  /** Get as JSON.

  *@return the json corresponding to this Configuration.

  */

  public String toJSON()

    throws ManifoldCFException

  {

    JSONWriter writer = new JSONWriter();

    writer.startObject();

    // We do NOT use the root node label, unlike XML.

      

    // Now, do children.  To get the arrays right, we need to glue together all children with
the

    // same type, which requires us to do an appropriate pass to gather that stuff together.

    // Since we also need to maintain order, it is essential that we detect the out-of-order
condition

    // properly, and use an alternate representation if we should find it.

    Map<String,List<ConfigurationNode>> childMap = new HashMap<String,List<ConfigurationNode>>();

    List<String> childList = new ArrayList<String>();

    String lastChildType = null;

    boolean needAlternate = false;

    int i = 0;

    while (i < getChildCount())

    {

      ConfigurationNode child = findChild(i++);

      String key = child.getType();

      List<ConfigurationNode> list = childMap.get(key);

      if (list == null)

      {

        list = new ArrayList<ConfigurationNode>();

        childMap.put(key,list);

        childList.add(key);

      }

      else

      {

        if (!lastChildType.equals(key))

        {

          needAlternate = true;

          break;

        }

      }

      list.add(child);

      lastChildType = key;

    }

        

    if (needAlternate)

    {

      // Can't use the array representation.  We'll need to start do a _children_ object,
and enumerate

      // each child.  So, the JSON will look like:

      // <key>:{_attribute_<attr>:xxx,_children_:[{_type_:<child_key>, ...},{_type_:<child_key_2>,
...}, ...]}

      writer.key(JSON_CHILDREN);

      writer.startArray();

      i = 0;

      while (i < getChildCount())

      {

        ConfigurationNode child = findChild(i++);

        writeNode(writer,child,false,true);

      }

      writer.endArray();

    }

    else

    {

      // We can collapse child nodes to arrays and still maintain order.

      // The JSON will look like this:

      // <key>:{_attribute_<attr>:xxx,<child_key>:[stuff],<child_key_2>:[more_stuff]
...}

      int q = 0;

      while (q < childList.size())

      {

        String key = childList.get(q++);

        List<ConfigurationNode> list = childMap.get(key);

        if (list.size() > 1)

        {

          // Write it as an array

          writer.key(key);

          writer.startArray();

          i = 0;

          while (i < list.size())

          {

            ConfigurationNode child = list.get(i++);

            writeNode(writer,child,false,false);

          }

          writer.endArray();

        }

        else

        {

          // Write it as a singleton

          writeNode(writer,list.get(0),true,false);

        }

      }

    }

    writer.endObject();

 

    // Convert to a string.

    return writer.toString();

  }

<<<<<< 

 

*IF* the specification from your UI-ordered rules cannot be output as the array-style JSON,
*THEN* the alternate representation will be used.  That is why I suggested that you hand-order
your example job and then output the JSON, because you will see the format that will definitely
preserve the order.  I strongly suggest using that format to guarantee the order.

There is a possibility that we have a bug where the ordering within types is preserved, but
the ordering between types is not properly preserved.  This is what I suspect is happening.
 If true, it is because we migrated to a different JSON implementation as a result of legal
issues a year or two back.  That's what I'm going to look at next.  But in any case you should
be able to use the order-guaranteed JSON format to get past your problems.

 

Thanks,

Karl

 

 

On Tue, Mar 13, 2018 at 4:02 PM, Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

The issue is due to the mapping from XML to JSON.  Order is preserved, but only within each
level.  So the includes are all in order but all includes go before all excludes, etc.  I'll
have to consider how best to resolve that.

 

Karl

 

On Tue, Mar 13, 2018 at 3:50 PM, Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

Hi Maxence,

 

If you EXPORT a job that works in JSON, and then IMPORT the exported JSON into a new job,
is that job broken?

 

Karl

 

 

On Tue, Mar 13, 2018 at 1:50 PM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

Hello Karl,

 

I have created 3 situations :

 

1.      Create job manually (1_job_manually.json | 1_job_manually.png)

2.      Create job with script and modify the order manually (2_job_mixte.json | 2_job_mixte.png)

3.      Create job with script (3_job_script.json | 3_job_script.png)

 

I do not see the difference.

 

So : 1 and 2 work good, with the good order, but 3 have included files and directories in
first.

 

Thanks,

Maxence

 

De : Karl Wright [mailto:daddywri@gmail.com <mailto:daddywri@gmail.com> ] 
Envoyé : lundi 12 mars 2018 21:29
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Cc : Fabien Harrang <FHARRANG@citya.com <mailto:FHARRANG@citya.com> >; REUILLON
Dominique <dreuillon@citya.com <mailto:dreuillon@citya.com> >


Objet : Re: Modify job to add excludes files and directory

 

Here is an idea.  Define your job in the ui and use the API to fetch the json for it.

 

Karl

 

On Mon, Mar 12, 2018, 12:51 PM Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>
> wrote:

I will need to look at this later tonight before I can respond in detail.

The document specification part of the API uses EXACTLY the same data as is stored for the
job.  There only difference is that the job specification is stored in XML, not JSON.  The
converters between the two do preserve ordering, however.

 

Karl

 

 

On Mon, Mar 12, 2018 at 12:38 PM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

1 :

I have find a problem on the file system connector parts in this page (I think) : https://manifoldcf.apache.org/release/release-2.9.1/en_US/programmatic-operation.html

 

You have read this JSON :

 

{"startpoint":[{"_attribute_path":"c:\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]}

 

I think, the json syntax is bad. I fink the correct JSON is :

 

{"startpoint":[{"_attribute_path":"c:\\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc","_attribute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]}

 

Corrections list : 

{"startpoint":[{"_attribute_path":"c:\\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"}],"exclude":["*.mov"]}]}

 

But, this configuration does not working with the Windows Share connector. Syntax error on
the exclude.

 

2 :

For my problem, the JSON format is not the problem. It work. I join the json, generated with
my python script and my database. (srvics33.json)

 

If I go on the interface after PUT the configuration, they included files are in first and
excluded in second. (image1.png) In my JSON, I have add excludes in first, but they are in
second.

I am forced to go on the interface and manually modify the order to optain a good result.
(image2.png)

 

Can I enter an order parameter [1-*] to place excluded files and directories in first?

 

Thanks.

 

Maxence

 

De : Karl Wright [mailto:daddywri@gmail.com <mailto:daddywri@gmail.com> ] 
Envoyé : lundi 12 mars 2018 14:38


À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Cc : Fabien Harrang <FHARRANG@citya.com <mailto:FHARRANG@citya.com> >; REUILLON
Dominique <DREUILLON@citya.com <mailto:DREUILLON@citya.com> >
Objet : Re: Modify job to add excludes files and directory

 

Hi Maxence,

 

You can have as many clauses in your JSON rule list as you like.  You do not need to have
both include and exclude rules in each clause.  So you can precisely do in the JSON what you
do in the UI.

 

Thanks,

Karl

 

 

On Mon, Mar 12, 2018 at 9:07 AM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

Ok. I have read that on the documentation :

 

 Rules are evaluated from top to bottom, and the first rule that matches the file name is
the one that is chosen. 

 

But, in the API, if I PUT a new Job definition with the good order, ManifoldCF add included
documents in first all the time. If I need to exlude in first, I can’t with API definition.
I add the JSON at this email.

 

API have an order parameter for the Startpoint, included and excluded files/directories ?

 

(PS : I prefer exclude in first and include * to have a total control on the GED, to keep
an eye on they documents)

(PS2 : I generate this JSON and send it with a python script and it working good)

 

Thanks

 

De : Karl Wright [mailto:daddywri@gmail.com <mailto:daddywri@gmail.com> ] 
Envoyé : vendredi 9 mars 2018 12:53
À : user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org> 
Cc : Fabien Harrang <FHARRANG@citya.com <mailto:FHARRANG@citya.com> >; REUILLON
Dominique <DREUILLON@citya.com <mailto:DREUILLON@citya.com> >
Objet : Re: Modify job to add excludes files and directory

 

Hi Maxence,

 

In the middle of job run, if you change the specification of what documents are included and
excluded, the implementation of the connector determines how it will behave.  There is no
guarantee that documents that are excluded will be removed, for example if the connector filters
documents only when they are queued.  You may need to run the job a second time to be sure
everything is removed.

So the official answer is that "it depends". 

 

Karl

 

 

On Fri, Mar 9, 2018 at 5:38 AM, msaunier <msaunier@citya.com <mailto:msaunier@citya.com>
> wrote:

Hello Karl,

 

If I add on a job (in live) new files and directories to exclude, ManifoldCF delete old indexed
files that meet these exclusions? Or I need to reseed all of my documents?

 

Thanks you.

 

Maxence SAUNIER

 

 

 

 

 

 

 

 

 

 

 


Mime
View raw message