opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nikhil jain <nikhil_jain1...@yahoo.com.INVALID>
Subject Re: Need to speed up the model creation process of OpenNLP
Date Wed, 19 Nov 2014 20:09:43 GMT
Hi Samik,
Thank you so much for the quick feedback.
1. You can possibly have smaller training sets and see if the models deteriorate substantially:
Yes I have 4 training sets each containing 1 million records but i dont understand how it
would be useful? because when I am creating a one model out of these 4 training sets then
I have to pass all the records at once for creating a model so it would take time, right? 
2. Another strategy is to incrementally introduce training sets containing specific class
of Token Names - that would provide a quicker turnaroundRight, I am doing the same thing
as you mentioned, like I have 4 different classes and each class contains 1 Million records.
so initially I created a model on 1 Millions records so it took less time and worked properly
then I added another one, so size of the corpus become 2 million and again created a model
based on 2 million records and so on, but the problem is when i am adding more records in
the corpus then model creation process is taking time.is it possible to reuse the model with
new training set, means like i have a model based on 2 million records and now i can say reuse
the old model but adjust the model again based on new records. if this is possible then small
training sets would be useful, right?
As I mentioned, I am new in openNLP and machine learning. so please explain with example if
I am missing something.

Thanks Nikhil
      From: Samik Raychaudhuri <samikr@gmail.com>
 To: dev@opennlp.apache.org 
 Sent: Wednesday, November 19, 2014 6:00 AM
 Subject: Re: Need to speed up the model creation process of OpenNLP
   
Hi,
This is essentially a machine learning problem, nothing to do with 
OpenNLP. If you have such a large corpus, it would take a substantial 
amount of time to train models. You can possibly have smaller training 
sets and see if the models deteriorate substantially. Another strategy 
is to incrementally introduce training sets containing specific class of 
Token Names - that would provide a quicker turnaround.
Hope this help.
Best,
-Samik




On 18/11/2014 8:46 AM, nikhil jain wrote:
> Hi,
> I asked below question yesterday, did anyone get a chance to look at this.
> I am new in OpenNLP and really need some help. Please provide some clue or link or example.
> ThanksNIkhil
>        From: nikhil jain <nikhil_jain1234@yahoo.com.INVALID>
>  To: "users@opennlp.apache.org" <users@opennlp.apache.org>; Dev at Opennlp Apache
<dev@opennlp.apache.org>
>  Sent: Tuesday, November 18, 2014 12:02 AM
>  Subject: Need to speed up the model creation process of OpenNLP
>    
> Hi,
> I am using OpenNLP Token Name Finder for parsing the unstructured data. I have created
a corpus of about 4 million records. When I am creating a model out of the training set using
openNLP API's in Eclipse using default setting (cut-off 5 and iterations 100), process is
taking a good amount of time, around 2-3 hours.
> Can someone suggest me how can I reduce the time as I want to experiment with different
iterations but as the model creation process is taking so much time, I am not able to experiment
with it. This is really a time consuming process.
> Please provide some feedback.
> Thanks in advance.Nikhil Jain
>
>    



  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message