opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anastasija Mensikova <mensikova.anastas...@gmail.com>
Subject Re: GSoC 2016: OpenNLP Sentiment Analysis
Date Wed, 25 May 2016 04:30:32 GMT
Hi everyone,

Hope you are having a great week! Here are some updates with regards to
what we are up to in Sentiment Analysis.

As you might know, we started working on developing a model, and during the
past week or so I've been developing a simple training mechanism for the
model, which you can now see on GitHub. I now need to work on adding Apache
License v2 headers and updating the README.md file, describing how to build
my files and train the model.

I will also start looking at GeoTopicParser more closely to see how
SentimentParser would be built for Tika, which means I will look at the
configuration properties and options and think about the design for it.

After we are done with some of the documentation and the framework, we are
planning to start working on more advanced Sentiment Analysis.

Thank you,
Anastasija

On 23 May 2016 at 21:04, Madhawa Kasun Gunasekara <madhawa30@gmail.com>
wrote:

> Hi Anastasija,
>
> I hope these datasets may be useful for your project [1]. It contains
> datasets for multilingual sentiment analysis as well.
>
> [1] https://www.w3.org/community/sentiment/wiki/Datasets
>
>
> Thanks,
> Madhawa
> <https://www.w3.org/community/sentiment/wiki/Datasets#Multilingual>
>
> Madhawa
>
> On Wed, May 18, 2016 at 9:48 PM, Mattmann, Chris A (3980) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hi Team,
>>
>> I’ve created a Github repo in my USCDataScience organization.
>> We’re going to put our model training and other code there,
>> before packaging it for Apache OpenNLP and/or for Apache Tika.
>>
>> You all should have got a GitHub invite, if not, let me know.
>>
>> Next step is for me to provide access to FisherCallHome, but
>> we’re starting out with the easier Movie dataset for now.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 5/17/16, 8:37 AM, "Anastasija Mensikova" <
>> mensikova.anastasija@gmail.com> wrote:
>>
>> >Hi Chris,
>> >
>> >
>> >I just sent you a Hangout invitation. I definitely can and want to talk
>> tomorrow. I'm back at home (in Latvia) now, so I'm free any time of the day
>> here (with the time difference it would be from around 7am ET till maybe
>> 3pm or 4pm ET the latest).
>> >
>> >
>> >Let me know!
>> >
>> >
>> >Thank you,
>> >Anastasija
>> >
>> >
>> >On 17 May 2016 at 07:41, Mattmann, Chris A (3980)
>> ><chris.a.mattmann@jpl.nasa.gov> wrote:
>> >
>> >Dear Anastasija,
>> >
>> >I’m reconnecting here since it’s been a bit. Do you have time for
>> >a Google Hangout tomorrow? Would you like to discuss your progress
>> >to date on the project?
>> >
>> >Thanks and please ping me on Google Hangout so we can chat.
>> >
>> >Cheers,
>> >Chris
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: chris.a.mattmann@nasa.gov
>> >WWW:
>> >http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Director, Information Retrieval and Data Science Group (IRDS)
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >WWW: http://irds.usc.edu/
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >On 5/3/16, 7:21 AM, "Anastasija Mensikova" <
>> mensikova.anastasija@gmail.com> wrote:
>> >
>> >>Hi everyone,
>> >>
>> >>
>> >>I joined the hangout at 9:40am ET just like last time, but nobody was
>> there (it's my fault, I should have checked beforehand that it's still
>> happening), I waited for about 25 minutes and left because I have to run to
>> class.
>> >>
>> >>
>> >>So, reporting on what I have done this week.
>> >>
>> >>
>> >>I'm in the period of all the final exams and projects right now, and
>> have been pulling all-nighters to catch up with all the school work, so
>> couldn't do as much as I wanted to with this project, but school is over in
>> 2 weeks and I promise I will devote
>> >> all of my time to this project right after.
>> >>I was trying to download GeoTopicParser, but for that I had to download
>> and install Maven in order to be able to use the mvn command, but, even
>> though it's a simple task, my computer just wouldn't let me use it. It
>> throws an exception, and I spent three
>> >> hours trying to figure out why, made sure my Java version matched,
>> even had someone professional look at it, but still couldn't fix it. I will
>> do that as soon as school is over. Nevertheless, I went through the
>> Gazetteer code to understand the logic behind
>> >> it, and then went on looking through OpenNLP and used the lecture
>> notes from the coursera course I was telling you about as my guide. It
>> makes more sense now how it works and how training the model is done.
>> >>I just have one quick question. I noticed OpenNLP uses MaxEntropy. In
>> our case, are we going to be using it as well, or are we going to be using
>> logistic regression instead for data classification?
>> >>
>> >>
>> >>I also have one little problem. I have a final exam this time next week
>> (for my Theory of Computation class), so I can't do the hangout at this
>> time.
>> >>
>> >>
>> >>Sorry for all the time confusions. I realise how hard it is to find the
>> perfect time to talk considering the time differences.
>> >>
>> >>
>> >>Thank you very much,
>> >>Anastasija
>> >>
>> >>
>> >>On 26 April 2016 at 09:56, Mattmann, Chris A (3980)
>> >><chris.a.mattmann@jpl.nasa.gov> wrote:
>> >>
>> >>Hi,
>> >>
>> >>Sure here is the link:
>> >>
>> >>https://hangouts.google.com/call/a2w5cgdtirf6jgfb4ww5l2l64ee
>> >>
>> >>Sorry for the delay.
>> >>
>> >>Cheers,
>> >>Chris
>> >>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>Chris Mattmann, Ph.D.
>> >>Chief Architect
>> >>Instrument Software and Science Data Systems Section (398)
>> >>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>Office: 168-519, Mailstop: 168-527
>> >>Email: chris.a.mattmann@nasa.gov
>> >>WWW:
>> >
>> >
>> >>http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>Director, Information Retrieval and Data Science Group (IRDS)
>> >>Adjunct Associate Professor, Computer Science Department
>> >>University of Southern California, Los Angeles, CA 90089 USA
>> >>WWW: http://irds.usc.edu/
>> >>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>On 4/26/16, 6:48 AM, "Anastasija Mensikova" <
>> mensikova.anastasija@gmail.com> wrote:
>> >>
>> >>>Hi everyone,
>> >>>
>> >>>
>> >>>Is the 9:40 ET hangout still happening? I just have to leave soon to
>> go to class.
>> >>>
>> >>>
>> >>>Thank you,
>> >>>Anastasija
>> >>>
>> >>>
>> >>>On 25 April 2016 at 23:39, Anastasija Mensikova
>> >>><mensikova.anastasija@gmail.com> wrote:
>> >>>
>> >>>Hi Chris,
>> >>>
>> >>>
>> >>>Yes, that's perfect. I'll be ready by 9:40am.
>> >>>
>> >>>
>> >>>Thank you,
>> >>>Anastasija
>> >>>
>> >>>
>> >>>On 25 April 2016 at 23:28, Mattmann, Chris A (3980)
>> >>><chris.a.mattmann@jpl.nasa.gov> wrote:
>> >>>
>> >>>Hey Anastasija,
>> >>>
>> >>>To be honest 9am EST is a little aggressive, I will likely be able
>> >>>to do 6:40 am PT (am traveling back from DC as I type this) which
>> >>>is 9:40am ET.
>> >>>
>> >>>My GChat handle is chris.mattmann@gmail.com. I will create a hangout
>> >>>and send to the list please contact me at 6:40am PT.
>> >>>
>> >>>Cheers,
>> >>>Chris
>> >>>
>> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>Chris Mattmann, Ph.D.
>> >>>Chief Architect
>> >>>Instrument Software and Science Data Systems Section (398)
>> >>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>>Office: 168-519, Mailstop: 168-527
>> >>>Email: chris.a.mattmann@nasa.gov
>> >>>WWW:
>> >>
>> >>
>> >>>http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>Director, Information Retrieval and Data Science Group (IRDS)
>> >>>Adjunct Associate Professor, Computer Science Department
>> >>>University of Southern California, Los Angeles, CA 90089 USA
>> >>>WWW: http://irds.usc.edu/
>> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>On 4/25/16, 11:07 PM, "Anastasija Mensikova" <
>> mensikova.anastasija@gmail.com> wrote:
>> >>>
>> >>>>Hi everyone,
>> >>>>
>> >>>>
>> >>>>So is the hangout session tomorrow (Tuesday) at 6:30pm IST (9am EST)
>> confirmed or not?
>> >>>>
>> >>>>
>> >>>>Thank you,
>> >>>>Anastasija
>> >>>>
>> >>>>
>> >>>>On 25 April 2016 at 15:23, Madhawa Kasun Gunasekara
>> >>>><madhawa30@gmail.com> wrote:
>> >>>>
>> >>>>Hi all,
>> >>>>
>> >>>>
>> >>>>Shall we have the hangout session tomorrow (Tuesday) about 18:30
IST ?
>> >>>>
>> >>>>
>> >>>>Thanks,
>> >>>>
>> >>>>Madhawa
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>Madhawa
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>On Sun, Apr 24, 2016 at 10:33 PM, Mondher Bouazizi
>> >>>><mondher.bouazizi@gmail.com> wrote:
>> >>>>
>> >>>>Hi,
>> >>>>
>> >>>>I am sorry for my late reply.
>> >>>>
>> >>>>Given the time difference between Japan and USA, I think I won't
be
>> >>>>available on weekdays. I will be available only on Friday/Saturday
>> morning
>> >>>>(9-10am EST).
>> >>>>
>> >>>>I am not sure if Chris is OK with that, we had our previous meetings
>> on
>> >>>>Saturday mornings.
>> >>>>
>> >>>>Otherwise, please go ahead. I will join as soon as I can.
>> >>>>
>> >>>>Thanks.
>> >>>>
>> >>>>@Chris: my github ID is mondher-bouazizi
>> >>>>
>> >>>>Best regards,
>> >>>>
>> >>>>Mondher
>> >>>>
>> >>>>On Mon, Apr 25, 2016 at 1:44 AM, Anastasija Mensikova <
>> >>>>mensikova.anastasija@gmail.com> wrote:
>> >>>>
>> >>>>> Hi Anthony,
>> >>>>>
>> >>>>> I can make it by Madhawa's proposal too, after 6pm IST on Tuesday
>> (after
>> >>>>> 8:30am EST). Let me know when exactly!
>> >>>>>
>> >>>>> Thank you,
>> >>>>> Anastasija
>> >>>>>
>> >>>>> On 24 April 2016 at 03:02, Anthony Beylerian <
>> anthony.beylerian@gmail.com>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi Anastasija,
>> >>>>>>
>> >>>>>> I'm not available by those times (00-07 JST).  I could make
it by
>> >>>>>> Madhawa's proposal, but otherwise please go ahead, we may
discuss
>> some
>> >>>>>> other time.
>> >>>>>>
>> >>>>>> @Chris: github ID : beylerian
>> >>>>>>
>> >>>>>> Best,
>> >>>>>>
>> >>>>>> Anthony
>> >>>>>>
>> >>>>>>
>> >>>>>> Please find my github profile
>> >>>
>> >>>
>> >>>>https://github.com/madhawa-gunasekara <
>> https://github.com/madhawa-gunasekara>
>> >>>>>>
>> >>>>>> Madhawa
>> >>>>>>
>> >>>>>> On Sun, Apr 24, 2016 at 12:13 AM, Madhawa Kasun Gunasekara
<
>> >>>>>> madhawa30@gmail.com> wrote:
>> >>>>>>
>> >>>>>> > Hi Chris,
>> >>>>>> >
>> >>>>>> > I'm available on Tuesday & Wednesday after 6.00
pm IST.
>> >>>>>> >
>> >>>>>> > Thanks,
>> >>>>>> > Madhawa
>> >>>>>> >
>> >>>>>> > Madhawa
>> >>>>>> >
>> >>>>>> > On Sat, Apr 23, 2016 at 11:38 PM, Anastasija Mensikova
<
>> >>>>>> > mensikova.anastasija@gmail.com> wrote:
>> >>>>>> >
>> >>>>>> >> Hi Chris,
>> >>>>>> >>
>> >>>>>> >> Thank you very much for your email. I'm so excited
to work with
>> you!
>> >>>>>> >>
>> >>>>>> >> My Github name is amensiko.
>> >>>>>> >>
>> >>>>>> >> And yes, next week sounds good! I'm available on:
Tuesday at
>> 4:20pm
>> >>>>>> EST,
>> >>>>>> >> Thursday 11am - 2:30pm and 4:20 - 6pm EST, Friday
11am - 3pm
>> EST.
>> >>>>>> >>
>> >>>>>> >> Thank you,
>> >>>>>> >> Anastasija
>> >>>>>> >>
>> >>>>>> >> On 23 April 2016 at 10:21, Mattmann, Chris A (3980)
<
>> >>>>>> >> chris.a.mattmann@jpl.nasa.gov> wrote:
>> >>>>>> >>
>> >>>>>> >>> Hi Anastasija,
>> >>>>>> >>>
>> >>>>>> >>> Hope you are well. It’s now time to get started
on the project.
>> >>>>>> >>> Monder, Anthony, Madhawa and I have been discussing
ideas about
>> >>>>>> >>> how to proceed with the project and even developing
a task
>> list.
>> >>>>>> >>> Let’s get your tasks input into that list,
and also coordinate.
>> >>>>>> >>>
>> >>>>>> >>> I also have an action to share some Spanish/English
data to try
>> >>>>>> >>> and do cross lingual sentiment analysis.
>> >>>>>> >>>
>> >>>>>> >>> Are you available to chat this week?
>> >>>>>> >>>
>> >>>>>> >>> Cheers,
>> >>>>>> >>> Chris
>> >>>>>> >>>
>> >>>>>> >>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>> >>> Chris Mattmann, Ph.D.
>> >>>>>> >>> Chief Architect
>> >>>>>> >>> Instrument Software and Science Data Systems
Section (398)
>> >>>>>> >>> NASA Jet Propulsion Laboratory Pasadena, CA
91109 USA
>> >>>>>> >>> Office: 168-519, Mailstop: 168-527
>> >>>>>> >>> Email: chris.a.mattmann@nasa.gov
>> >>>>>> >>> WWW:
>> >>>
>> >>>
>> >>>>http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >>>>>> >>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>> >>> Director, Information Retrieval and Data Science
Group (IRDS)
>> >>>>>> >>> Adjunct Associate Professor, Computer Science
Department
>> >>>>>> >>> University of Southern California, Los Angeles,
CA 90089 USA
>> >>>>>> >>> WWW: http://irds.usc.edu/
>> >>>>>> >>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>>
>> >>>>>> >>> On 4/23/16, 4:49 AM, "Anthony Beylerian" <
>> anthony.beylerian@gmail.com
>> >>>>>> >
>> >>>>>> >>> wrote:
>> >>>>>> >>>
>> >>>>>> >>> >Hello,
>> >>>>>> >>> >
>> >>>>>> >>> >Congratulations for being accepted for
this year's GSoC.
>> >>>>>> >>> >Although Mondher and myself will not participate
this year as
>> >>>>>> students,
>> >>>>>> >>> we
>> >>>>>> >>> >will do our best to help.
>> >>>>>> >>> >We are currently busy with academic research,
but will join
>> the
>> >>>>>> efforts
>> >>>>>> >>> >when possible.
>> >>>>>> >>> >Otherwise, for any discussion concerning
the proposed
>> approaches,
>> >>>>>> please
>> >>>>>> >>> >let us know.
>> >>>>>> >>> >
>> >>>>>> >>> >Best,
>> >>>>>> >>> >
>> >>>>>> >>> >On Sat, Apr 23, 2016 at 6:02 PM, Madhawa
Kasun Gunasekara <
>> >>>>>> >>> >madhawa30@gmail.com> wrote:
>> >>>>>> >>> >
>> >>>>>> >>> >> Sure we will start working on this.
>> >>>>>> >>> >>
>> >>>>>> >>> >> Thanks,
>> >>>>>> >>> >> Madhawa
>> >>>>>> >>> >>
>> >>>>>> >>> >> Madhawa
>> >>>>>> >>> >>
>> >>>>>> >>> >> On Sat, Apr 23, 2016 at 1:38 AM, Chris
Mattmann <
>> >>>>>> mattmann@apache.org>
>> >>>>>> >>> >> wrote:
>> >>>>>> >>> >>
>> >>>>>> >>> >>> Congrats!
>> >>>>>> >>> >>>
>> >>>>>> >>> >>> time to get started team.
>> >>>>>> >>> >>>
>> >>>>>> >>>
>> >>>>>> >>
>> >>>>>> >>
>> >>>>>> >
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message