manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Tavard <>
Subject MCF transformation connector contribution
Date Thu, 15 Mar 2018 10:35:16 GMT
Hello MCF community,

I developed a transformation connector based on Jsoup. The goal of this code id to simply
choose an encompassing tag in a HTML document for text extracting. And inside this tag, this
connector allows you to remove subparts that you do no want : all the tags corresponding to
declared types or specific attribute tag names for example.
I would like to know if it could interest you. The code is in Apache V2 licence  and I integrated
it in our enterprise search solution (Datafari). This morning I integrated the code in a fork
MCF project on GitHub. Obviously it needs some work including code refactoring, renaming classes,
unit tests that I will be able to do if you are interested by the code.
The code is here : <>
And the documentation here :

Best regards,

Olivier TAVARD

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message