manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: MCF transformation connector contribution
Date Fri, 04 May 2018 20:28:06 GMT
Yes, please do update the patch.  I'm sorry I did not get to this; many
other things intruded.  I created the branch but did not apply the original
patch onto it, so please supply a whole new patch.

Karl


On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
olivier.tavard@francelabs.com> wrote:

> Hi,
>
> I wanted to know if the code remains interesting for the MCF community.
> I updated it since the initial release so please tell me if I need to
> submit a new patch into the issue already created :
> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500
> <https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500
> >
>
> Thanks,
> Best regards,
>
> Olivier TAVARD
>
>
> > Le 15 mars 2018 à 15:58, Karl Wright <daddywri@gmail.com> a écrit :
> >
> > Excellent!!
> >
> > Thank you again.  I'll try to set up the branch this weekend.
> >
> > Karl
> >
> >
> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> > olivier.tavard@francelabs.com> wrote:
> >
> >> Hi Karl,
> >>
> >> Sure thing, I created a ticket : https://issues.apache.org/
> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
> >> attachment.
> >> No specific libraries used, just JSOUP library that is already in the
> MCF
> >> core project.
> >>
> >> Best regards,
> >>
> >> Olivier
> >>
> >>
> >>> Le 15 mars 2018 à 11:51, Karl Wright <daddywri@gmail.com> a écrit
:
> >>>
> >>> Hi Oliver,
> >>>
> >>> Thank you very much for your contribution!
> >>>
> >>> To have a legal trail, I usually prefer the following approach --
> >>>
> >>> (1) Create a ticket
> >>> (2) Attach a diff to the ticket
> >>>
> >>> We'll then integrate the diff into a branch, and then finally into
> trunk.
> >>>
> >>> Can you also let us know what kinds of dependent jars the contribution
> >>> has?  We'd need to know about not only direct dependencies, but also
> any
> >>> downstream dependencies that may be incompatible with the Apache
> License.
> >>> Usually we can figure this out but it saves time to know in advance if
> >>> there are LGPL dependencies (for instance).
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> >>> olivier.tavard@francelabs.com> wrote:
> >>>
> >>>> Hello MCF community,
> >>>>
> >>>> I developed a transformation connector based on Jsoup. The goal of
> this
> >>>> code id to simply choose an encompassing tag in a HTML document for
> text
> >>>> extracting. And inside this tag, this connector allows you to remove
> >>>> subparts that you do no want : all the tags corresponding to declared
> >> types
> >>>> or specific attribute tag names for example.
> >>>> I would like to know if it could interest you. The code is in Apache
> V2
> >>>> licence  and I integrated it in our enterprise search solution
> >> (Datafari).
> >>>> This morning I integrated the code in a fork MCF project on GitHub.
> >>>> Obviously it needs some work including code refactoring, renaming
> >> classes,
> >>>> unit tests that I will be able to do if you are interested by the
> code.
> >>>> The code is here : https://github.com/otavard/manifoldcf/tree/
> >>>> htmlextractorconnector <
> https://github.com/otavard/manifoldcf/commits/
> >>>> htmlextractorconnector>
> >>>> And the documentation here : https://datafari.atlassian.
> >>>>
> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
> >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
> >>>> pages/237240321/HTML+Extractor+Transformation+connector>
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Olivier TAVARD
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message