manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: MCF transformation connector contribution
Date Sat, 05 May 2018 09:57:44 GMT
Hi Olivier,

This was actually already committed.  But it was renamed as the
html-extractor connector, not "datafari", which didn't mean anything to me.

Any changes you want to make should therefore be supplied as a diff against
the html-extractor connector.

Sorry for the confusion!!

Karl


On Fri, May 4, 2018 at 4:28 PM Karl Wright <daddywri@gmail.com> wrote:

> Yes, please do update the patch.  I'm sorry I did not get to this; many
> other things intruded.  I created the branch but did not apply the original
> patch onto it, so please supply a whole new patch.
>
> Karl
>
>
> On Fri, May 4, 2018 at 11:28 AM Olivier Tavard <
> olivier.tavard@francelabs.com> wrote:
>
>> Hi,
>>
>> I wanted to know if the code remains interesting for the MCF community.
>> I updated it since the initial release so please tell me if I need to
>> submit a new patch into the issue already created :
>> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500
>> <
>> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500
>> >
>>
>> Thanks,
>> Best regards,
>>
>> Olivier TAVARD
>>
>>
>> > Le 15 mars 2018 à 15:58, Karl Wright <daddywri@gmail.com> a écrit :
>> >
>> > Excellent!!
>> >
>> > Thank you again.  I'll try to set up the branch this weekend.
>> >
>> > Karl
>> >
>> >
>> > On Thu, Mar 15, 2018 at 10:52 AM, Olivier Tavard <
>> > olivier.tavard@francelabs.com> wrote:
>> >
>> >> Hi Karl,
>> >>
>> >> Sure thing, I created a ticket : https://issues.apache.org/
>> >> jira/projects/CONNECTORS/issues/CONNECTORS-1500 with the code in
>> >> attachment.
>> >> No specific libraries used, just JSOUP library that is already in the
>> MCF
>> >> core project.
>> >>
>> >> Best regards,
>> >>
>> >> Olivier
>> >>
>> >>
>> >>> Le 15 mars 2018 à 11:51, Karl Wright <daddywri@gmail.com> a écrit
:
>> >>>
>> >>> Hi Oliver,
>> >>>
>> >>> Thank you very much for your contribution!
>> >>>
>> >>> To have a legal trail, I usually prefer the following approach --
>> >>>
>> >>> (1) Create a ticket
>> >>> (2) Attach a diff to the ticket
>> >>>
>> >>> We'll then integrate the diff into a branch, and then finally into
>> trunk.
>> >>>
>> >>> Can you also let us know what kinds of dependent jars the contribution
>> >>> has?  We'd need to know about not only direct dependencies, but also
>> any
>> >>> downstream dependencies that may be incompatible with the Apache
>> License.
>> >>> Usually we can figure this out but it saves time to know in advance
if
>> >>> there are LGPL dependencies (for instance).
>> >>>
>> >>> Karl
>> >>>
>> >>>
>> >>> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
>> >>> olivier.tavard@francelabs.com> wrote:
>> >>>
>> >>>> Hello MCF community,
>> >>>>
>> >>>> I developed a transformation connector based on Jsoup. The goal
of
>> this
>> >>>> code id to simply choose an encompassing tag in a HTML document
for
>> text
>> >>>> extracting. And inside this tag, this connector allows you to remove
>> >>>> subparts that you do no want : all the tags corresponding to declared
>> >> types
>> >>>> or specific attribute tag names for example.
>> >>>> I would like to know if it could interest you. The code is in Apache
>> V2
>> >>>> licence  and I integrated it in our enterprise search solution
>> >> (Datafari).
>> >>>> This morning I integrated the code in a fork MCF project on GitHub.
>> >>>> Obviously it needs some work including code refactoring, renaming
>> >> classes,
>> >>>> unit tests that I will be able to do if you are interested by the
>> code.
>> >>>> The code is here : https://github.com/otavard/manifoldcf/tree/
>> >>>> htmlextractorconnector <
>> https://github.com/otavard/manifoldcf/commits/
>> >>>> htmlextractorconnector>
>> >>>> And the documentation here : https://datafari.atlassian.
>> >>>>
>> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
>> >>>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
>> >>>> pages/237240321/HTML+Extractor+Transformation+connector>
>> >>>>
>> >>>> Best regards,
>> >>>>
>> >>>> Olivier TAVARD
>> >>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message