manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Potts <>
Subject Reading and posting plain text, rather than encoded files
Date Thu, 27 Aug 2015 19:47:08 GMT
I've spent a very short time playing with ManifoldCF. Cool project, thank
you for contributing it.

I can read binary files from a source repo like Alfresco 5.0.d and post
them to Elasticsearch 1.7.2 successfully.

Now I'm wondering if the rest of my use cases can be achieved with

Use case 1: Read JSON from a file system, post to Elasticsearch as-is

When I tried to use the file system repository and the Elasticsearch
output, I noticed that the file is being encoded and stored in ES in the
_content property. What I'd rather do is have the file posted to ES as-is,
such as if the file is already a JSON document in the expected format for
my type mapping in ES. These files are 15k to 30k of nested object JSON.

Use case 2: Read JSON from Alfresco, post it to Elasticsearch along with
object metadata

In a slight twist on the first, I'd like to store JSON documents in a
repository, like Alfresco, and then read the metadata from the Alfresco
object and merge it with the JSON stored in the content and post that to
Elasticsearch as a JSON string, not as an encoded blob.

I didn't see anything covering these in the docs but I may have missed it.


View raw message