lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Solr 7.1.0 - concurrent.ExecutionException building model
Date Fri, 06 Apr 2018 00:27:49 GMT
Hi Joe,

Currently you will eventually run into memory problems if the training sets
gets too large. Under the covers on each node it is creating a matrix with
a row for each document and a column for each feature. This can get large
quite quickly. By choosing fewer features you can make this matrix much
smaller.

Its fairly easy to make the train function work on a random sample of the
training set on each iteration rather then the entire training set, but
currently this is not how its implemented. Feel free to create a ticket
requesting this the sampling approach.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Apr 5, 2018 at 5:32 PM, Joe Obernberger <
joseph.obernberger@gmail.com> wrote:

> I tried to build a large model based on about 1.2 million documents.  One
> of the nodes ran out of memory and killed itself. Is this much data not
> reasonable to use?  The nodes have 16g of heap.  Happy to increase it, but
> not sure if this is possible?
>
> Thank you!
>
> -Joe
>
>
>
> On 4/5/2018 10:24 AM, Joe Obernberger wrote:
>
>> Thank you Shawn - sorry so long to respond, been playing around with this
>> a good bit.  It is an amazing capability.  It looks like it could be
>> related to certain nodes in the cluster not responding quickly enough.  In
>> one case, I got the concurrent.ExecutionException, but it looks like the
>> root cause was a SocketTimeoutException.  I'm using HDFS for the index and
>> it gets hit pretty hard by other processes running, and I'm wondering if
>> that's causing this.
>>
>> java.io.IOException: java.util.concurrent.ExecutionException:
>> java.io.IOException: params expr=update(models,+batchSize%
>> 3D"50",train(MODEL1033_1522883727011,features(MODEL1033_
>> 1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_
>> 1522883727011",field%3D"Text",outcome%3D"out_i",
>> positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL10
>> 33",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000")
>> )&qt=/stream&explain=true&q=*:*&fl=id&sort=id+asc&distrib=false
>>         at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openS
>> treams(CloudSolrStream.java:405)
>>         at org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(
>> CloudSolrStream.java:275)
>>         at com.ngc.bigdata.ie_solrmodelbuilder.SolrModelBuilderProcesso
>> r.doWork(SolrModelBuilderProcessor.java:114)
>>         at com.ngc.intelenterprise.intelentutil.utils.Processor.run(
>> Processor.java:140)
>>         at com.ngc.intelenterprise.intelentutil.jms.IntelEntQueueProc.
>> process(IntelEntQueueProc.java:208)
>>         at org.apache.camel.processor.DelegateSyncProcessor.process(Del
>> egateSyncProcessor.java:63)
>>         at org.apache.camel.management.InstrumentationProcessor.process
>> (InstrumentationProcessor.java:77)
>>         at org.apache.camel.processor.RedeliveryErrorHandler.process(Re
>> deliveryErrorHandler.java:460)
>>         at org.apache.camel.processor.CamelInternalProcessor.process(Ca
>> melInternalProcessor.java:190)
>>         at org.apache.camel.processor.CamelInternalProcessor.process(Ca
>> melInternalProcessor.java:190)
>>         at org.apache.camel.component.direct.DirectProducer.process(Dir
>> ectProducer.java:62)
>>         at org.apache.camel.processor.SendProcessor.process(SendProcess
>> or.java:141)
>>         at org.apache.camel.management.InstrumentationProcessor.process
>> (InstrumentationProcessor.java:77)
>>         at org.apache.camel.processor.RedeliveryErrorHandler.process(Re
>> deliveryErrorHandler.java:460)
>>         at org.apache.camel.processor.CamelInternalProcessor.process(Ca
>> melInternalProcessor.java:190)
>>         at org.apache.camel.processor.CamelInternalProcessor.process(Ca
>> melInternalProcessor.java:190)
>>         at org.apache.camel.component.jms.EndpointMessageListener.onMes
>> sage(EndpointMessageListener.java:114)
>>         at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.doInvokeListener(AbstractMessageListenerContainer.java:699)
>>         at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.invokeListener(AbstractMessageListenerContainer.java:637)
>>         at org.springframework.jms.listener.AbstractMessageListenerCont
>> ainer.doExecuteListener(AbstractMessageListenerContainer.java:605)
>>         at org.springframework.jms.listener.AbstractPollingMessageListe
>> nerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.
>> java:308)
>>         at org.springframework.jms.listener.AbstractPollingMessageListe
>> nerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.
>> java:246)
>>         at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.invokeListener(DefaultMessageLis
>> tenerContainer.java:1144)
>>         at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessag
>> eListenerContainer.java:1136)
>>         at org.springframework.jms.listener.DefaultMessageListenerConta
>> iner$AsyncMessageListenerInvoker.run(DefaultMessageListenerContai
>> ner.java:1033)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>>         at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
>> params expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883
>> 727011,features(MODEL1033_1522883727011,q%3D"*:*",
>> featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",
>> outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1
>> 000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"
>> out_i",maxIterations%3D"1000"))&qt=/stream&explain=true&q=*:
>> *&fl=id&sort=id+asc&distrib=false
>>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>>         at org.apache.solr.client.solrj.io.stream.CloudSolrStream.openS
>> treams(CloudSolrStream.java:399)
>>         ... 27 more
>> Caused by: java.io.IOException: params expr=update(models,+batchSize%
>> 3D"50",train(MODEL1033_1522883727011,features(MODEL1033_
>> 1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_
>> 1522883727011",field%3D"Text",outcome%3D"out_i",
>> positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL10
>> 33",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000")
>> )&qt=/stream&explain=true&q=*:*&fl=id&sort=id+asc&distrib=false
>>         at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrS
>> tream.java:115)
>>         at org.apache.solr.client.solrj.io.stream.CloudSolrStream$Strea
>> mOpener.call(CloudSolrStream.java:510)
>>         at org.apache.solr.client.solrj.io.stream.CloudSolrStream$Strea
>> mOpener.call(CloudSolrStream.java:499)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolE
>> xecutor.lambda$execute$0(ExecutorUtil.java:188)
>>         ... 3 more
>> Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout
>> occured while waiting response from server at:
>> http://leda:9100/solr/MODEL1033_1522883727011_shard20_replica_n74
>>         at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMeth
>> od(HttpSolrClient.java:637)
>>         at org.apache.solr.client.solrj.impl.HttpSolrClient.request(Htt
>> pSolrClient.java:253)
>>         at org.apache.solr.client.solrj.impl.HttpSolrClient.request(Htt
>> pSolrClient.java:242)
>>         at org.apache.solr.client.solrj.SolrClient.request(SolrClient.j
>> ava:1219)
>>         at org.apache.solr.client.solrj.io.stream.SolrStream.constructP
>> arser(SolrStream.java:269)
>>         at org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrS
>> tream.java:113)
>>         ... 7 more
>> Caused by: java.net.SocketTimeoutException: Read timed out
>>         at java.net.SocketInputStream.socketRead0(Native Method)
>>         at java.net.SocketInputStream.socketRead(SocketInputStream.java
>> :116)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>>         at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(Se
>> ssionInputBufferImpl.java:139)
>>         at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(Se
>> ssionInputBufferImpl.java:155)
>>         at org.apache.http.impl.io.SessionInputBufferImpl.readLine(Sess
>> ionInputBufferImpl.java:284)
>>         at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHea
>> d(DefaultHttpResponseParser.java:138)
>>         at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHea
>> d(DefaultHttpResponseParser.java:56)
>>         at org.apache.http.impl.io.AbstractMessageParser.parse(Abstract
>> MessageParser.java:261)
>>         at org.apache.http.impl.DefaultBHttpClientConnection.receiveRes
>> ponseHeader(DefaultBHttpClientConnection.java:165)
>>         at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(C
>> PoolProxy.java:165)
>>         at org.apache.http.protocol.HttpRequestExecutor.doReceiveRespon
>> se(HttpRequestExecutor.java:272)
>>         at org.apache.http.protocol.HttpRequestExecutor.execute(HttpReq
>> uestExecutor.java:124)
>>         at org.apache.http.impl.execchain.MainClientExec.execute(
>> MainClientExec.java:272)
>>         at org.apache.http.impl.execchain.ProtocolExec.execute(
>> ProtocolExec.java:185)
>>         at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.
>> java:89)
>>         at org.apache.http.impl.execchain.RedirectExec.execute(
>> RedirectExec.java:111)
>>         at org.apache.http.impl.client.InternalHttpClient.doExecute(Int
>> ernalHttpClient.java:185)
>>         at org.apache.http.impl.client.CloseableHttpClient.execute(Clos
>> eableHttpClient.java:83)
>>         at org.apache.http.impl.client.CloseableHttpClient.execute(Clos
>> eableHttpClient.java:56)
>>         at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMeth
>> od(HttpSolrClient.java:525)
>>
>> -Joe
>>
>>
>> On 4/2/2018 7:09 PM, Shawn Heisey wrote:
>>
>>> On 4/2/2018 1:55 PM, Joe Obernberger wrote:
>>>
>>>> The training data was split across 20 shards - specifically created
>>>> with:
>>>> http://icarus.querymasters.com:9100/solr/admin/collections?
>>>> action=CREATE&name=MODEL1024_1522696624083&numShards=20&rep
>>>> licationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING
>>>>
>>>> Any ideas?  The complete error is:
>>>>
>>> <snip>
>>>
>>>> <body><h2>HTTP ERROR 404</h2>
>>>> <p>Problem accessing
>>>> /solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason:
>>>> <pre>    Not Found</pre></p>
>>>> </body>
>>>>
>>> I'll warn you in advance that I know nothing at all about the learning
>>> to rank functionality.  I'm replying about the underlying error you're
>>> getting, independent of what your query is trying to accomplish.
>>>
>>> It's a 404 error, trying to access the URL mentioned above.
>>>
>>> The error doesn't indicate exactly WHAT wasn't found.  It could either
>>> be the core named "MODEL1024_1522696624083_shard20_replica_n75" or the
>>> "/select" handler on that core.  That's something you need to figure
>>> out.  It could be that the core *does* exist, but for some reason, Solr
>>> on that machine was unable to start it.
>>>
>>> The solr.log file on the Solr instance that returned the error (which
>>> seems to be on the machine named vesta, answering to port 9100) may have
>>> more detail for the error, or some additional error messages.
>>>
>>> Normally SolrCloud is good at making sure that requests aren't sent to
>>> resources that aren't working.  So I'm not sure why this happened.
>>>
>>> Are there other errors or warnings in the solr.log file, either on the
>>> instance where you sent your request, or the instance that returned the
>>> 404 error?
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> http://www.avg.com
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message