manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "balaji (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CONNECTORS-1574) Performance tuning of manifold
Date Mon, 28 Jan 2019 08:04:00 GMT
balaji created CONNECTORS-1574:
----------------------------------

             Summary: Performance tuning of manifold
                 Key: CONNECTORS-1574
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
             Project: ManifoldCF
          Issue Type: Bug
          Components: File system connector, JCIFS connector, Solr 6.x component
    Affects Versions: ManifoldCF 2.5
         Environment: Apache manifold installed in Linux machine

Linux version 3.10.0-327.el7.ppc64le

Red Hat Enterprise Linux Server release 7.2 (Maipo)
            Reporter: balaji


My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of data. we are currently
having 450-500 jobs which needs to run simultaneously. We need to index json data and we are
using connector type as *file system* along with *postgres* as backend database. 

We are facing several issues like
1. Scheduling works for some jobs and doesn't work for other jobs. 
2. Some jobs gets completed and some jobs hangs and doesn't get completed.
3. With one job earlier 60000 documents was getting indexed in 15minutes but now even a directory
path having 5 documents takes 20 minutes or sometimes doesn't get completed
4. "list all jobs" or "status and job management" page doesn't load sometimes and on seeing
the pg_stat_activity we observe that 2 queries are in waiting state state because of which
the page doesn't load. so if we kill those queries or restart manifold the issue gets resolved
and the page loads properly
queries getting stuck:
1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE (STATUS=$1 OR STATUS=$2)
FOR UPDATE
2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 WHERE ID=$2

note : We have deployed manifold in *linux*. Our major requirement is scheduling of jobs which
will run every 15 minutes

Please help us in fine tuning manifold so that it runs smoothly and acts as a robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message