jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7389) Mongo/FileBlobStore does not update timestamp for already existing blobs
Date Thu, 05 Apr 2018 13:43:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426941#comment-16426941
] 

Amit Jain commented on OAK-7389:
--------------------------------

[~catholicon] Thanks using MongoBlob might be the problem. Regarding the id in the update
blob that was not set initially and was added later on in my experimentation.

[~tmueller] We don't use the old vs new paradigm in Oak as we did in Jackrabbit (I think that
was the plan initially). As we use the MarkSweepGarbageCollector to trigger GC for all BlobStoresDataStore
and use the GarbageCollectableBlobStore interface to only get all chunk ids as well as deletes,
the timestamp update would be required.

> Mongo/FileBlobStore does not update timestamp for already existing blobs
> ------------------------------------------------------------------------
>
>                 Key: OAK-7389
>                 URL: https://issues.apache.org/jira/browse/OAK-7389
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob
>    Affects Versions: 1.2.14, 1.4.20, 1.8.2, 1.6.11
>            Reporter: Amit Jain
>            Assignee: Amit Jain
>            Priority: Critical
>             Fix For: 1.2.30
>
>         Attachments: OAK-7389-v1.patch
>
>
> MongoBlobStore uses uses the {{insert}} call and ignores any exceptions which means any
existing value won't be updated.
> {code:java}
>     @Override
>     protected void storeBlock(byte[] digest, int level, byte[] data) throws IOException
{
>         String id = StringUtils.convertBytesToHex(digest);
>         cache.put(id, data);
>         // Check if it already exists?
>         MongoBlob mongoBlob = new MongoBlob();
>         mongoBlob.setId(id);
>         mongoBlob.setData(data);
>         mongoBlob.setLevel(level);
>         mongoBlob.setLastMod(System.currentTimeMillis());
>         // TODO check the return value
>         // TODO verify insert is fast if the entry already exists
>         try {
>             getBlobCollection().insertOne(mongoBlob);
>         } catch (DuplicateKeyException e) {
>             // the same block was already stored before: ignore
>         } catch (MongoException e) {
>             if (e.getCode() == DUPLICATE_KEY_ERROR_CODE) {
>                 // the same block was already stored before: ignore
>             } else {
>                 throw new IOException(e.getMessage(), e);
>             }
>         }
>     }
> {code}
>  FileBlobStore also returns if there's a file already existing without updating the
timestamp
> {code:java}
>     @Override
>     protected synchronized void storeBlock(byte[] digest, int level, byte[] data) throws
IOException {
>         File f = getFile(digest, false);
>         if (f.exists()) {
>             return;
>         }
>         .........
> {code}
> The above would cause data loss in DSGC if there are updates to the blob blocks which
are re-surrected (stored again at the time of DSGC) because the timestamp would never have
been modified.
>  
> cc/ [~tmueller], [~mreutegg], [~chetanm], [~catholicon]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message