manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Piergiorgio Lucidi <piergior...@apache.org>
Subject Re: MCF transformation connector contribution
Date Sat, 05 May 2018 12:02:49 GMT
Hi,

I have just updated the CHANGES.txt adding CONNECTORS-1500 included in the
2.10 release with a mention to Olivier.

Olivier, thank you so much for your contribution.

We should find a good way to also create a test suite for this new
connector.

Cheers,
PJ

2018-05-05 11:57 GMT+02:00 Karl Wright <daddywri@gmail.com>:

> Hi Olivier,
>
> This was actually already committed.  But it was renamed as the
> html-extractor connector, not "datafari", which didn't mean anything to me.
>
> Any changes you want to make should therefore be supplied as a diff against
> the html-extractor connector.
>
> Sorry for the confusion!!
>
> Karl
>
>
> On Fri, May 4, 2018 at 4:28 PM Karl Wright <daddywri@gmail.com> wrote:
>
> > Yes, please do update the patch.  I'm sorry I did not get to this; many
> > other things intruded.  I created the branch but did not apply the
> original
> > patch onto it, so please supply a whole new patch.
> >
> > Karl
> >
> >
> > On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
> > olivier.tavard@francelabs.com> wrote:
> >
> >> Hi,
> >>
> >> I wanted to know if the code remains interesting for the MCF community.
> >> I updated it since the initial release so please tell me if I need to
> >> submit a new patch into the issue already created :
> >> https://issues.apache.org/jira/projects/CONNECTORS/
> issues/CONNECTORS-1500
> >> <
> >> https://issues.apache.org/jira/projects/CONNECTORS/
> issues/CONNECTORS-1500
> >> >
> >>
> >> Thanks,
> >> Best regards,
> >>
> >> Olivier TAVARD
> >>
> >>
> >> > Le 15 mars 2018 à 15:58, Karl Wright <daddywri@gmail.com> a écrit
:
> >> >
> >> > Excellent!!
> >> >
> >> > Thank you again.  I'll try to set up the branch this weekend.
> >> >
> >> > Karl
> >> >
> >> >
> >> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
> >> > olivier.tavard@francelabs.com> wrote:
> >> >
> >> >> Hi Karl,
> >> >>
> >> >> Sure thing, I created a ticket : https://issues.apache.org/
> >> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
> >> >> attachment.
> >> >> No specific libraries used, just JSOUP library that is already in the
> >> MCF
> >> >> core project.
> >> >>
> >> >> Best regards,
> >> >>
> >> >> Olivier
> >> >>
> >> >>
> >> >>> Le 15 mars 2018 à 11:51, Karl Wright <daddywri@gmail.com>
a écrit :
> >> >>>
> >> >>> Hi Oliver,
> >> >>>
> >> >>> Thank you very much for your contribution!
> >> >>>
> >> >>> To have a legal trail, I usually prefer the following approach
--
> >> >>>
> >> >>> (1) Create a ticket
> >> >>> (2) Attach a diff to the ticket
> >> >>>
> >> >>> We'll then integrate the diff into a branch, and then finally into
> >> trunk.
> >> >>>
> >> >>> Can you also let us know what kinds of dependent jars the
> contribution
> >> >>> has?  We'd need to know about not only direct dependencies, but
also
> >> any
> >> >>> downstream dependencies that may be incompatible with the Apache
> >> License.
> >> >>> Usually we can figure this out but it saves time to know in advance
> if
> >> >>> there are LGPL dependencies (for instance).
> >> >>>
> >> >>> Karl
> >> >>>
> >> >>>
> >> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> >> >>> olivier.tavard@francelabs.com> wrote:
> >> >>>
> >> >>>> Hello MCF community,
> >> >>>>
> >> >>>> I developed a transformation connector based on Jsoup. The
goal of
> >> this
> >> >>>> code id to simply choose an encompassing tag in a HTML document
for
> >> text
> >> >>>> extracting. And inside this tag, this connector allows you
to
> remove
> >> >>>> subparts that you do no want : all the tags corresponding to
> declared
> >> >> types
> >> >>>> or specific attribute tag names for example.
> >> >>>> I would like to know if it could interest you. The code is
in
> Apache
> >> V2
> >> >>>> licence  and I integrated it in our enterprise search solution
> >> >> (Datafari).
> >> >>>> This morning I integrated the code in a fork MCF project on
GitHub.
> >> >>>> Obviously it needs some work including code refactoring, renaming
> >> >> classes,
> >> >>>> unit tests that I will be able to do if you are interested
by the
> >> code.
> >> >>>> The code is here : https://github.com/otavard/manifoldcf/tree/
> >> >>>> htmlextractorconnector <
> >> https://github.com/otavard/manifoldcf/commits/
> >> >>>> htmlextractorconnector>
> >> >>>> And the documentation here : https://datafari.atlassian.
> >> >>>>
> >> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
> >> >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
> >> >>>> pages/237240321/HTML+Extractor+Transformation+connector>
> >> >>>>
> >> >>>> Best regards,
> >> >>>>
> >> >>>> Olivier TAVARD
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >> >>
> >>
> >>
>



-- 
Piergiorgio Lucidi
https://www.open4dev.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message