manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Tavard <olivier.tav...@francelabs.com>
Subject Re: Job error during WindowsShare repository connector indexation
Date Wed, 11 Oct 2017 15:16:20 GMT
Hi,

Thanks for your answers.
OK I will definitively use Zookeeper rather file-based synchronization and let you know.

For information, the syncharea folder during our crawl was not accessed by any other process.
The server is dedicated to MCF. The OS is Debian 8 and the files are on standard Linux filesystem
(ext3). We did not increase the max open files in this server (only on the Solr servers),
it is a good thing to investigate, thanks.
Regardless of the change for ZK, is it possible to change this behavior in MCF by automatically
stopping the job for example when this exception occurs ?

Thanks,

Olivier TAVARD


> Le 11 oct. 2017 à 14:15, Karl Wright <daddywri@gmail.com> a écrit :
> 
> In this case it's the *directory* that it doesn't find, so it can't create the file.
 If the syncharea is in an NFS-mounted filesystem, then you can get problems of this kind,
which is why we strongly advise using Zookeeper instead of playing those kinds of games.
> 
> Karl
> 
> 
> On Wed, Oct 11, 2017 at 7:20 AM, Luis Cabaceira <cabaceira@gmail.com <mailto:cabaceira@gmail.com>>
wrote:
> I've seen similar errors (that actually seam like the file is not there or has been deleted,
while in fact it exists) due to the reasons i've wrote before.
> 
> On 11 October 2017 at 15:12, Karl Wright <daddywri@gmail.com <mailto:daddywri@gmail.com>>
wrote:
> This error:
> 
> >>>>>>
> WARN 2017-10-09 08:23:56,284 (Idle cleanup thread) - MCF|MCF-agent|apache.manifoldcf.lock|Attempt
to set file lock 'mcf/mcf_home/./syncharea/551/442/lock-_POOLTARGET__REPOSITORYCONNECTORPOOL_SmbFileShare.lock'
failed: No such file or directory
> java.io.IOException: No such file or directory
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.createNewFile(File.java:1012)
> at org.apache.manifoldcf.core.lockmanager.FileLockObject.grabFileLock(FileLockObject.java:223)
> at org.apache.manifoldcf.core.lockmanager.FileLockObject.obtainGlobalWriteLockNoWait(FileLockObject.java:78)
> at org.apache.manifoldcf.core.lockmanager.LockObject.obtainGlobalWriteLock(LockObject.java:121)
> at org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:74)
> at org.apache.manifoldcf.core.lockmanager.LockGate.enterWriteLock(LockGate.java:177)
> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWrite(BaseLockManager.java:1120)
> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWriteLock(BaseLockManager.java:757)
> at org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteLock(LockManager.java:302)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:585)
> at org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
> at org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
> at org.apache.manifoldcf.crawlerui.IdleCleanupThread.run(IdleCleanupThread.java:69)
> And the error was repeated indefinitely in the log.
> <<<<<<
> 
> is due to somebody erasing the file-based syncharea while ManifoldCF processes were active.
 We strongly suggest using Zookeeper rather than file-based synch, in any case.
> 
> Thanks,
> 
> Karl
> 
> 
> On Wed, Oct 11, 2017 at 6:05 AM, Luis Cabaceira <cabaceira@gmail.com <mailto:cabaceira@gmail.com>>
wrote:
> From the look of it, this can be coming from a limitation on the number file handles.
You process can be creating too many file handles and not closing those in time, eventually
preventing further file operations. 
> 
> I suggest you check this, in Linux run : cat /proc/sys/fs/file-max
> 
> 
> To see the hard and soft values : 
> 
> # ulimit -Hn
> # ulimit -Sn
> 
> P.S. - Change into the user that is running Manifold first
> 
> 
> On 11 October 2017 at 13:54, Olivier Tavard <olivier.tavard@francelabs.com <mailto:olivier.tavard@francelabs.com>>
wrote:
> Hi,
> 
> Thanks for your answer.
> Yes I could reach the samba server from the MCF server. Indeed, the first hours after
the MCF job was launched, thousands of documents were correctly accessed and processed by
MCF. The mentioned errors appeared only after few hours. Before that, the indexation was done
correctly.
> 
> Best regards,
> Olivier TAVARD
> 
> 
>> Le 11 oct. 2017 à 11:21, Cihad Guzel <cguzelg@gmail.com <mailto:cguzelg@gmail.com>>
a écrit :
>> 
>> Hi Olivier,
>> 
>> Did you try to connect to samba server with any samba client app? Check Iptables
on your server. Can you stop iptables on ubuntu server? Maybe, you can configure iptables.
>> 
>> Regards,
>> Cihad Guzel
>> 
>> 
>> 2017-10-11 12:02 GMT+03:00 Olivier Tavard <olivier.tavard@francelabs.com <mailto:olivier.tavard@francelabs.com>>:
>> Hi,
>> 
>> I had this error during crawling a Samba hosted on Ubuntu Server :
>> ERROR 2017-10-05 00:00:14,109 (Idle cleanup thread) - MCF|MCF-agent|apache.manifoldcf.crawlerthreads|Exception
tossed: Service '_ANON_0' of type '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
>> org.apache.manifoldcf.core.int <http://org.apache.manifoldcf.core.int/>erfaces.ManifoldCFException:
Service '_ANON_0' of type '_REPOSITORYCONNECTORPOOL_SmbFileShare' is not active
>> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.updateServiceData(BaseLockManager.java:273)
>> at org.apache.manifoldcf.core.lockmanager.LockManager.updateServiceData(LockManager.java:108)
>> at org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:654)
>> at org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
>> at org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
>> at org.apache.manifoldcf.crawler.system.IdleCleanupThread.run(IdleCleanupThread.java:68)
>> 
>> I used MCF 2.8.1 on Debian 8 with Postgresql 9.5.3, Windows Share repository connector.
The job was configured to process about 2 millions of files  (600 GB). 
>> For text extraction I used a Tika server (on the same server as MCF) and add the
Tika external content extractor transformation connector into the job configuration.
>> The error was present 9 hours after the job was launched. The status job still indicated
that the job was running but there was only 1 document in the active column and the error
above was repeated in the MCF log.
>> 
>> Then I tried to launch the clean-lock.sh script and I obtained this error :
>> WARN 2017-10-09 08:23:56,284 (Idle cleanup thread) - MCF|MCF-agent|apache.manifoldcf.lock|Attempt
to set file lock 'mcf/mcf_home/./syncharea/551/442/lock-_POOLTARGET__REPOSITORYCONNECTORPOOL_SmbFileShare.lock'
failed: No such file or directory
>> java.io.IOException: No such file or directory
>> at java.io.UnixFileSystem.createFileExclusively(Native Method)
>> at java.io.File.createNewFile(File.java:1012)
>> at org.apache.manifoldcf.core.lockmanager.FileLockObject.grabFileLock(FileLockObject.java:223)
>> at org.apache.manifoldcf.core.lockmanager.FileLockObject.obtainGlobalWriteLockNoWait(FileLockObject.java:78)
>> at org.apache.manifoldcf.core.lockmanager.LockObject.obtainGlobalWriteLock(LockObject.java:121)
>> at org.apache.manifoldcf.core.lockmanager.LockObject.enterWriteLock(LockObject.java:74)
>> at org.apache.manifoldcf.core.lockmanager.LockGate.enterWriteLock(LockGate.java:177)
>> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWrite(BaseLockManager.java:1120)
>> at org.apache.manifoldcf.core.lockmanager.BaseLockManager.enterWriteLock(BaseLockManager.java:757)
>> at org.apache.manifoldcf.core.lockmanager.LockManager.enterWriteLock(LockManager.java:302)
>> at org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:585)
>> at org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
>> at org.apache.manifoldcf.crawler.repositoryconnectorpool.RepositoryConnectorPool.pollAllConnectors(RepositoryConnectorPool.java:124)
>> at org.apache.manifoldcf.crawlerui.IdleCleanupThread.run(IdleCleanupThread.java:69)
>> And the error was repeated indefinitely in the log.
>> 
>> Did it mean that there was a problem with the syncharea folder at some point ?
>> 
>> Thanks,
>> Best regards,
>> 
>> Olivier TAVARD
>> 
>> 
>> 
>> -- 
>> Cihad Güzel
> 
> 
> 
> 
> -- 
> Luis Cabaceira
> 
> 
> 
> 
> -- 
> Luis Cabaceira
> 


Mime
View raw message