chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron DiFrango <>
Subject Re: AW: document 'uniqueness'
Date Sun, 22 Jun 2014 14:23:20 GMT

The suggestion below from Sascha is a good one.  The other approach I¹ve
take before is to perform a search in the repo for a given document and
only if it does not exist would I insert it, otherwise perform an update
or just log it as an ³error².


Ron DiFrango       
Director / Architect  |  CapTech

On 6/22/14, 5:37 AM, "Sascha Homeier" <> wrote:

>Hi Tim,
>you said you need to migrate the documents from FileNet to a CMIS
>compliant server.
>Is the CMIS compliant server your implementation?
>If so you could calculate a Hash like MD5 over the  content stream and
>set it as the object ID.
>Due to the CMIS spec this object ID needs to be unique. So it must be
>ensured that no two objects with the same object ID exists in the same
>CMIS repository which is equivalent to have two objects with the same
>content stream.
>This approach whould also ensure to not add equal documents in the future
>after migration is done.
>Nevertheless here you also need to find a performant way of determining
>if an object with an ID already exists (and find a solution if the hash
>is changed only by a timestamp inside the content stream etc.)
>With about two million objects you maybe need to extend the RAM on the
>migration machine to keep such many objects in memory and comparing it by
>using Hashmaps and Hashtables with own implementations of equals() and
>hashCode() ;)
>Anyway a stimulating task. I'm curious about the ideas of others here to
>solve it in a performant way ;)
>-----Ursprüngliche Nachricht-----
>Von: Tim Webster []
>Gesendet: Samstag, 21. Juni 2014 17:55
>Betreff: Re: document 'uniqueness'
>yes thanks for the suggestion - it sort of does that already with the
>Spring Batch progress tracking, but it still won't prevent another
>document being added to the repository that is identical to a previous
>one if it somehow failed - like a JVM crash or power failure.  Because
>there is no transaction management for the CMIS part, you can't really
>ensure this, except for a constraint in the repository itself.
>Anyway, yeah I think you're right and I need to look at FileNet
>specifically.  I just wasn't sure if I missed something and there was
>something in the CMIS spec that I could use (e.g. some property or
>On Fri, Jun 20, 2014 at 10:24 PM, Lucas, Mike <> wrote:
>> I'm sure you've already thought of this, but couldn't your migration
>> process just persist the legacy ids in a separate location (e.g.
>> database table, possibly cached in memory for performance)? Then you
>> would just need to check that for each document being migrated, to
>> make sure that the same doc hasn't been seen previously.
>> Not a CMIS related solution, but seems like it would work fine...
>> The other option, as you suggest, is to see if FileNet supports a
>> 'uniqueness' constraint for custom metadata properties. I believe
>> Sharepoint does but not sure about FileNet.
>> Thanks
>> michael lucas  |  Senior Software Developer  |  Great-West Life
>> -----Original Message-----
>> From: Tim Webster []
>> Sent: June 20, 2014 8:15 AM
>> To:
>> Subject: document 'uniqueness'
>> Hi,
>> I am developing a migration process (using Spring Batch) to migrate
>> documents from a legacy CMS into a CMIS-compliant system, and I need
>> to ensure that duplicate documents are not created accidentally.
>> However, our CMIS system (IBM FileNet) allows the addition of
>> documents with the same name.  Documents with identical values for
>> cmis:name or cmis:contentStreamFilename are allowed.  Even if this
>> could be disabled (I don't know if it can or cannot), it is a business
>> requirement and I wouldn't be able to.
>> The only thing I can think of to prevent this is to save the 'legacy'
>> ID of the document in a new CMIS property and somehow check that it
>> doesn't already exist when adding a new document. However this will be
>> very inefficient and slow down the migration (we're talking about up
>> to 2 million documents).
>> Ideally the 'uniqueness constraint' would be checked on the server and
>> would throw an exception, which I could then deal with.
>> Does anyone know of an easier way to do this, or is there anything I
>> can make use of in the CMIS spec to help?
>> Thanks,

View raw message