manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cédric Ulmer <>
Subject Architecture options for truncating large documents
Date Fri, 14 Oct 2016 15:02:35 GMT
Hi all,


We are currently looking at  the possibility to truncate large objects
before indexing them, at the MCF level. For this, we have an architecture
dilemma, and we are open to the wisdom of the community:


*         What we want to achieve: Whenever a document is too large, instead
of just dropping it completely, we want to be able to index its metada.


*         How we can achieve that:

Option 1. : We create transformation connector that empties the stream, and
keep only the metadata. Pros: we don’t modify the code of MCF. Cons: anytime
we install MCF somewhere, we need to manually reconfigure the transfo
connector as there is way no way to automatically import transformation


Option 2. : We modify the standard behavior of the original connector (say
the file connector). Instead of proposing the option to drop a document if
it’s larger than size X, we modify it so that it proposes to drop its
content but keep the metadata if larger than size X. Pros: it is in the MCF
code once and for all, thus available whenever we install a new MCF
somewhere. Cons: it may not be inline with the spirit of transformation
connectors, and it requires to do it for any original connector that we are


Can you share your thoughts on that?







France Labs – Les experts du Search 

Vainqueur du challenge Internal Search de EY à
<> Viva Technologies 2016


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message