manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Tavard <olivier.tav...@francelabs.com>
Subject Re: MCF transformation connector contribution
Date Thu, 15 Mar 2018 14:52:43 GMT
Hi Karl,

Sure thing, I created a ticket : https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1500
with the code in attachment.
No specific libraries used, just JSOUP library that is already in the MCF core project.

Best regards,

Olivier


> Le 15 mars 2018 à 11:51, Karl Wright <daddywri@gmail.com> a écrit :
> 
> Hi Oliver,
> 
> Thank you very much for your contribution!
> 
> To have a legal trail, I usually prefer the following approach --
> 
> (1) Create a ticket
> (2) Attach a diff to the ticket
> 
> We'll then integrate the diff into a branch, and then finally into trunk.
> 
> Can you also let us know what kinds of dependent jars the contribution
> has?  We'd need to know about not only direct dependencies, but also any
> downstream dependencies that may be incompatible with the Apache License.
> Usually we can figure this out but it saves time to know in advance if
> there are LGPL dependencies (for instance).
> 
> Karl
> 
> 
> On Thu, Mar 15, 2018 at 6:35 AM, Olivier Tavard <
> olivier.tavard@francelabs.com> wrote:
> 
>> Hello MCF community,
>> 
>> I developed a transformation connector based on Jsoup. The goal of this
>> code id to simply choose an encompassing tag in a HTML document for text
>> extracting. And inside this tag, this connector allows you to remove
>> subparts that you do no want : all the tags corresponding to declared types
>> or specific attribute tag names for example.
>> I would like to know if it could interest you. The code is in Apache V2
>> licence  and I integrated it in our enterprise search solution (Datafari).
>> This morning I integrated the code in a fork MCF project on GitHub.
>> Obviously it needs some work including code refactoring, renaming classes,
>> unit tests that I will be able to do if you are interested by the code.
>> The code is here : https://github.com/otavard/manifoldcf/tree/
>> htmlextractorconnector <https://github.com/otavard/manifoldcf/commits/
>> htmlextractorconnector>
>> And the documentation here : https://datafari.atlassian.
>> net/wiki/spaces/DATAFARI/pages/237240321/HTML+Extractor+Transformation+
>> connector <https://datafari.atlassian.net/wiki/spaces/DATAFARI/
>> pages/237240321/HTML+Extractor+Transformation+connector>
>> 
>> Best regards,
>> 
>> Olivier TAVARD
>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message