manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: MCF transformation connector contribution
Date Thu, 15 Mar 2018 10:51:29 GMT
Hi Oliver,

Thank you very much for your contribution!

To have a legal trail, I usually prefer the following approach --

(1) Create a ticket
(2) Attach a diff to the ticket

We'll then integrate the diff into a branch, and then finally into trunk.

Can you also let us know what kinds of dependent jars the contribution
has?  We'd need to know about not only direct dependencies, but also any
downstream dependencies that may be incompatible with the Apache License.
Usually we can figure this out but it saves time to know in advance if
there are LGPL dependencies (for instance).


On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <> wrote:

> Hello MCF community,
> I developed a transformation connector based on Jsoup. The goal of this
> code id to simply choose an encompassing tag in a HTML document for text
> extracting. And inside this tag, this connector allows you to remove
> subparts that you do no want : all the tags corresponding to declared types
> or specific attribute tag names for example.
> I would like to know if it could interest you. The code is in Apache V2
> licence  and I integrated it in our enterprise search solution (Datafari).
> This morning I integrated the code in a fork MCF project on GitHub.
> Obviously it needs some work including code refactoring, renaming classes,
> unit tests that I will be able to do if you are interested by the code.
> The code is here :
> htmlextractorconnector <
> htmlextractorconnector>
> And the documentation here : https://datafari.atlassian.
> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
> connector <
> pages/237240321/HTML+Extractor+Transformation+connector>
> Best regards,
> Olivier TAVARD

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message