opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damiano Porta <damianopo...@gmail.com>
Subject Re: Document categorization
Date Sat, 24 Sep 2016 14:05:05 GMT
Hi Cohan,
you are right! my apologies... I just have sent a new email to
users@opennlp.apache.org

Damiano

2016-09-24 16:00 GMT+02:00 Cohan Sujay Carlos <cohan@aiaioo.com>:

> Damiano,
>
> I just wanted to point out that perhaps you didn't realize that you're
> posting these questions on the developers' mailing list which is "for
> development discussions, patch suggestions, and current issues posted to
> the issue tracker for the project."
>
> There's an OpenNLP users mailing list where you might get better answers
> from a larger community of practitioners (the first mailing list in
> https://opennlp.apache.org/mail-lists.html).
>
> Cohan
>
>
> On Sat, Sep 24, 2016 at 7:12 PM, Damiano Porta <damianoporta@gmail.com>
> wrote:
>
> > Hello,
> > we need to categorize our documents in 80 sectors. These documents are
> > resumes/cv.
> >
> > We have many documents (more than 30k) but there is a problem.
> > Should we try to extract the job positions inside each resume and
> > categorize them or can we just add the entire document and categorize it
> in
> > one or more categories? (max 3 categories)
> >
> > I think there is a lof o noising data that can give us many false
> positives
> > if we use the entire document. For example, the personal data, hobbies
> etc
> >
> > BUT
> >
> > I also know that extract every job position from all the documents will
> > take years!
> >
> > Can anyone give me any workaround ?
> >
> > Thank you so much!
> > Damiano
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message