manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1574) Performance tuning of manifold
Date Mon, 28 Jan 2019 11:29:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753913#comment-16753913
] 

Karl Wright commented on CONNECTORS-1574:
-----------------------------------------

If you look in the ManifoldCF log, all queries that take more than a minute to execute are
logged, along with an EXPLAIN plan.  Could you look at your logs and find the queries and
provide their explanation?

The quality of the query plans is usually dependent on the quality of the statistics that
the database keeps.  When the statistics are out of date, then the plan sometimes gets horribly
bad.  ManifoldCF *attempts* to keep up with this by re-analyzing tables after a fixed number
of changes, but necessarily it cannot do better than estimate the number of changes and their
effects on the table statistics.  So if you are experiencing problems with certain queries,
you can set properties.xml values that increase the frequency of analyze operations for that
table.  But first we need to know what's going wrong.


> Performance tuning of manifold
> ------------------------------
>
>                 Key: CONNECTORS-1574
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector, JCIFS connector, Solr 6.x component
>    Affects Versions: ManifoldCF 2.5
>         Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>            Reporter: balaji
>            Assignee: Karl Wright
>            Priority: Critical
>              Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of data. we are
currently having 450-500 jobs which needs to run simultaneously. We need to index json data
and we are using connector type as *file system* along with *postgres* as backend database.

> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 60000 documents was getting indexed in 15minutes but now even
a directory path having 5 documents takes 20 minutes or sometimes doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes and on
seeing the pg_stat_activity we observe that 2 queries are in waiting state state because of
which the page doesn't load. so if we kill those queries or restart manifold the issue gets
resolved and the page loads properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE (STATUS=$1 OR
STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is scheduling of jobs
which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a robust
system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message