manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <abenede...@apache.org>
Subject Re: Contributing OpenNLP connector
Date Wed, 18 Nov 2015 14:20:08 GMT
Hey Chal,
First of all thanks you very much for the contribution!
I have some observations :

*Model Downloading*

Taking the look to the way you provide the user with the models, I can see
there is a shell script to download very specific english models.
It would be great having the possibility to configure the model to use in
the connector config UI .
In particular I see two possibilities :
1) you provide a select list per model required and then automatically you
download the model and install it
2) you provide the user with the possibility of uploading the model he/she
wants to use ( more flexible, but the user will need to download a model on
his own)
In my opinion is really important to keep the transformation connector
flexible, able to work with different languages and models.

*Text enrichment*
Taking a look to the code I see in here a really strong assumption :

String textContent = new String(bytes);

This means you assume the only input possible is plain text.
Actually as we know we have the binary there, not necessary a plain string.
I think we need to specify the Tika Transformer to be a requirement for
this connector.
Furthermore I would suggest the possibility for the user to select the list
of input fields to be considered to be the source of the extraction.

e.g.
I can configure my extraction to happen from title,text and description.

Of course it is required a Transformer Connector to happen before the
OpenNLP one, to provide those fields.
These are quick considerations after a first look to the code, happy to
discuss and help further :)

Cheers




On 18 November 2015 at 13:47, Karl Wright <daddywri@gmail.com> wrote:

> Thanks, Chalitha, for contributing this!
>
> I hope to have a look at the code also, but it won't happen until next week
> I'm afraid.
>
> Karl
>
>
> On Wed, Nov 18, 2015 at 7:44 AM, Rafa Haro <rharoapache@gmail.com> wrote:
>
> > Hi Chalitha!
> >
> >
> >
> >
> > Awesome!. I will take a look to this as soon as possible.
> >
> >
> >
> >
> > Cheers,
> >
> > Rafa
> >
> > On Wed, Nov 18, 2015 at 1:22 PM, chalitha udara Perera
> > <chalithaudara@gmail.com> wrote:
> >
> > > Hi All,
> > > I have worked on a OpenNLP based transformation connector for some
> > > requirement. Given a document it extracts named entities such as
> people,
> > > locations and organisations and add those as metadata to repository
> > > document.
> > > If you think this will be useful for the community, I would like to
> > > contribute it to manifoldcf.
> > > Connector code is available here [1].
> > > [1] https://github.com/ChalithaUdara/OpenNLP-Manifold-Connector
> > > Thanks,
> > > Chalitha
> > > --
> > > J.M Chalitha Udara Perera
> > > *Department of Computer Science and Engineering,*
> > > *University of Moratuwa,*
> > > *Sri Lanka*
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message