lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From neilb <nb...@hotmail.com>
Subject Solr indexing with Tika DIH local vs network share
Date Tue, 26 Mar 2019 15:27:48 GMT
Hi, I am trying to setup Solr for our  project which can return full text
searches on PDF documents. I am able to run the sample Tika DIH example
locally on my windows server machine. It can index all PDF documents
recursively in "baseDir" of config xml. Presently "baseDir" points to local
folder on the same machine and has around 10K pdf files. This whole setup
works as expected.

Next step is to import PDF documents located on network share. I created
another core, with very similar configuration files except this time,
baseDir points to network share ("\\myserver\pdfshare"). I have no success
in indexing these documents on newly created core. I have tried mapping this
network share to local drive and updated config accordingly but still no
success. 
I managed to copy all pdf file from network share to local folder where
example core with sample Tika DIH points and I am able to index all pdf
files. 

So I am not sure why Tika config with network path is not able to index the
files. Looking into log I can see following entries but that doesn't explain
anything. Can someone guide to resolve the issue?

2019-03-26 13:58:37.250 DEBUG (Scheduler-1147580192) [   ]
o.e.j.i.FillInterest onFail
FillInterest@419eacc8{AC.ReadCB@1ad637ed{HttpConnection@1ad637ed::SocketChannelEndPoint@6190d407{/10.206.11.68:51486<->/10.205.53.163:8983,OPEN,fill=FI,flush=-,to=120010/120000}{io=1/1,kio=1,kro=1}->HttpConnection@1ad637ed[p=HttpParser{s=START,0
of
-1},g=HttpGenerator@7d81e85c{s=START}]=>HttpChannelOverHttp@10e588cc{r=2,c=false,a=IDLE,uri=null,age=0}}}
java.util.concurrent.TimeoutException: Idle timeout expired: 120010/120000
ms
	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166)
[jetty-io-9.4.14.v20181114.jar:9.4.14.v20181114]
	at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
[jetty-io-9.4.14.v20181114.jar:9.4.14.v20181114]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
[?:1.8.0_201]
	at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_201]
	at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
Source) [?:1.8.0_201]
	at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source) [?:1.8.0_201]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:1.8.0_201]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[?:1.8.0_201]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_201]


Is it possible that Solr is not ale to access the network share? Is this
anyway that I can run Solr.cmd under different user (who as access to
network share) in windows environment?
Please let me know if you wish to know any more details about the issue.


Thanks in advance




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message