opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rodrigo Agerri <rage...@apache.org>
Subject Re: Need to speed up the model creation process of OpenNLP
Date Wed, 26 Nov 2014 09:59:04 GMT
Hi,

Yes, you are right, although I guess you can just pass a null if you
do not need resources.

Is the multi-threading working for you?

R

On Mon, Nov 24, 2014 at 6:26 PM, nikhil jain <nikhil_jain1234@yahoo.com> wrote:
> Hi Rodrigo,
>
> I was trying to call train method without resource but I was getting some
> errors. I did not find any train method without resources.
>
> I found these train methods in class NameFinderME:
>
> 1. train(String languageCode, String type, ObjectStream<NameSample>
> samples,TrainingParameters trainParams, byte[] featureGeneratorBytes,
> Map<String,Object> resources)
> 2. train(String languageCode, String type, ObjectStream<NameSample>
> samples,TrainingParameters trainParams, AdaptiveFeatureGenerator
> generator,Map<String,Object> resources)
> 3. train(String languageCode, String type, ObjectStream<NameSample>
> samples,Map<String,Object> resources)
> 4. train(String languageCode, String type, ObjectStream<NameSample>
> samples,AdaptiveFeatureGenerator generator, Map<String,Object> resources,
> int iterations, int cutoff)
>
> Am I missing something, Could you please tell me how can I do so?
>
> Thanks
> Nikhil
>
> ________________________________
> From: Rodrigo Agerri <ragerri@apache.org>
> To: nikhil jain <nikhil_jain1234@yahoo.com>
> Sent: Friday, November 21, 2014 12:12 AM
>
> Subject: Re: Need to speed up the model creation process of OpenNLP
>
> Hi Nikhil,
>
> It looks good, but you do not seem to need the resources, though, you
> why do not use the train method without the resources?
>
> Also, do you have 50 threads?
>
> Rodrigo
>
>
>
> On Thu, Nov 20, 2014 at 5:57 PM, nikhil jain <nikhil_jain1234@yahoo.com>
> wrote:
>> Thanks for the feedback Rodrigo.
>> Yes I am trying to create a model based on maximum entropy. As I am using
>> API's for building the model, so I tried adding thread param in the
>> Training
>> parameters object but  I am not sure whether I am adding the param
>> correctly
>> or not. I haven't find any clue in documentation as well.
>>
>> Here is my code developed with the help of openNLP documentation. Is it
>> the
>> correct way of creating a maxent model using multi threads?
>>
>> TrainingParameters tp = new TrainingParameters();
>> tp.put(TrainingParameters.ALGORITHM_PARAM, "MAXENT");
>> tp.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(100));
>> tp.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(5));
>> tp.put("Threads", "50");
>>
>> Map<String, Object> resources = new HashMap<String, Object>();
>> model = NameFinderME.train( "en", "sample", sampleStream, tp, generator,
>> resources);
>> Thanks
>> Nikhil
>>
>>
>> ________________________________
>> From: Rodrigo Agerri <ragerri@apache.org>
>> To: nikhil jain <nikhil_jain1234@yahoo.com>
>> Sent: Thursday, November 20, 2014 11:35 AM
>>
>> Subject: Re: Need to speed up the model creation process of OpenNLP
>>
>> Hi Nikhil
>> The maxent trainer already allows multi thread training. If you are using
>> the cli specify the Threads in your Trainparams file. Check the paramaters
>> file sample distributed with opennlp.
>> If using it via API perhaps the easiest is to create a TrainingParameters
>> object with the threads param specified.
>> HTH
>> R
>>
>>
>> On 19 Nov 2014 21:19, "nikhil jain" <nikhil_jain1234@yahoo.com> wrote:
>>
>> Hi Rodrigo,
>>
>> No, I am not using multi-threading, it's a simple Java program, took help
>> from openNLP documentation but it is worth mentioning over here is that as
>> the corpus is containing 4 million records so my Java program running in
>> eclipse was frequently giving me java heap space issue (out of memory
>> issue)
>> so I investigate a bit and found that process was taking around 10GB
>> memory
>> for building the model so i increased the memory to 10 GB using -Xmx
>> parameter. so it worked properly but took 3 hours.
>>
>> Thanks
>> -NIkhil
>>
>> ________________________________
>> From: Rodrigo Agerri <ragerri@apache.org>
>> To: "dev@opennlp.apache.org" <dev@opennlp.apache.org>; nikhil jain
>> <nikhil_jain1234@yahoo.com>
>> Cc: "users@opennlp.apache.org" <users@opennlp.apache.org>
>> Sent: Wednesday, November 19, 2014 2:17 AM
>> Subject: Re: Need to speed up the model creation process of OpenNLP
>>
>> Hi,
>>
>> Are you using multithreading, lots of threads, RAM memory?
>>
>> R
>>
>>
>>
>>
>> On Tue, Nov 18, 2014 at 5:46 PM, nikhil jain
>> <nikhil_jain1234@yahoo.com.invalid> wrote:
>>> Hi,
>>> I asked below question yesterday, did anyone get a chance to look at
>>> this.
>>> I am new in OpenNLP and really need some help. Please provide some clue
>>> or
>>> link or example.
>>> ThanksNIkhil
>>>      From: nikhil jain <nikhil_jain1234@yahoo.com.INVALID>
>>>  To: "users@opennlp.apache.org" <users@opennlp.apache.org>; Dev at
>>> Opennlp
>>> Apache <dev@opennlp.apache.org>
>>>  Sent: Tuesday, November 18, 2014 12:02 AM
>>>  Subject: Need to speed up the model creation process of OpenNLP
>>>
>>> Hi,
>>> I am using OpenNLP Token Name Finder for parsing the unstructured data. I
>>> have created a corpus of about 4 million records. When I am creating a
>>> model
>>> out of the training set using openNLP API's in Eclipse using default
>>> setting
>>> (cut-off 5 and iterations 100), process is taking a good amount of time,
>>> around 2-3 hours.
>>> Can someone suggest me how can I reduce the time as I want to experiment
>>> with different iterations but as the model creation process is taking so
>>> much time, I am not able to experiment with it. This is really a time
>>> consuming process.
>>> Please provide some feedback.
>>> Thanks in advance.Nikhil Jain
>>>
>>>
>>
>>
>>
>>
>
>

Mime
View raw message