manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: MCF transformation connector contribution
Date Thu, 15 Mar 2018 14:58:33 GMT
Excellent!!

Thank you again.  I'll try to set up the branch this weekend.

Karl


On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
olivier.tavard@francelabs.com> wrote:

> Hi Karl,
>
> Sure thing, I created a ticket : https://issues.apache.org/
> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
> attachment.
> No specific libraries used, just JSOUP library that is already in the MCF
> core project.
>
> Best regards,
>
> Olivier
>
>
> > Le 15 mars 2018 à 11:51, Karl Wright <daddywri@gmail.com> a écrit :
> >
> > Hi Oliver,
> >
> > Thank you very much for your contribution!
> >
> > To have a legal trail, I usually prefer the following approach --
> >
> > (1) Create a ticket
> > (2) Attach a diff to the ticket
> >
> > We'll then integrate the diff into a branch, and then finally into trunk.
> >
> > Can you also let us know what kinds of dependent jars the contribution
> > has?  We'd need to know about not only direct dependencies, but also any
> > downstream dependencies that may be incompatible with the Apache License.
> > Usually we can figure this out but it saves time to know in advance if
> > there are LGPL dependencies (for instance).
> >
> > Karl
> >
> >
> > On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> > olivier.tavard@francelabs.com> wrote:
> >
> >> Hello MCF community,
> >>
> >> I developed a transformation connector based on Jsoup. The goal of this
> >> code id to simply choose an encompassing tag in a HTML document for text
> >> extracting. And inside this tag, this connector allows you to remove
> >> subparts that you do no want : all the tags corresponding to declared
> types
> >> or specific attribute tag names for example.
> >> I would like to know if it could interest you. The code is in Apache V2
> >> licence  and I integrated it in our enterprise search solution
> (Datafari).
> >> This morning I integrated the code in a fork MCF project on GitHub.
> >> Obviously it needs some work including code refactoring, renaming
> classes,
> >> unit tests that I will be able to do if you are interested by the code.
> >> The code is here : https://github.com/otavard/manifoldcf/tree/
> >> htmlextractorconnector <https://github.com/otavard/manifoldcf/commits/
> >> htmlextractorconnector>
> >> And the documentation here : https://datafari.atlassian.
> >> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
> >> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
> >> pages/237240321/HTML+Extractor+Transformation+connector>
> >>
> >> Best regards,
> >>
> >> Olivier TAVARD
> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message