manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Update on a now-fixed old problem and questions about database usage
Date Wed, 21 May 2014 20:45:37 GMT
And, sorry, about the database size -- much of that is likely going into
your history table.  You can limit the amount of history stored, or disable
history entirely, by means of a configuration parameter.  Have a look at
the "how-to-build-and-deploy" page.

Karl


On Wed, May 21, 2014 at 4:31 PM, Tom Rees <trees@chiliad.com> wrote:

> Dear ManifoldCF:
>
> First, I would like to report that switching to ManifoldCF 1.6 solved a
> problem I encountered with version 1.4.1: whenever I ran two web crawls
> simultaneously the two crawls would stop progressing within a half an hour.
> The 1.6 version works beautifully. Thank you for the excellent work.
>
> Now I have a couple issues with the database that I would appreciate your
> feedback on. First, the two crawls that I mentioned finished and pulled
> down a little over 255,000 documents. The size of the postgres (version
> 9.3.2) database on the disk, however, expanded to use a little over 8 GB of
> space, and this is after running a full vacuum. This seems like a lot of
> space for two medium sized crawls. Is there a way to get the web crawler to
> use less database space?
>
> Secondly, when I ran two simultaneous web crawls with the NULL output
> connector, the crawls worked without issue. When I ran the same two
> simultaneous web crawls with a custom output connector that wrote the files
> to a local file system everything worked fine. However, when I used an
> output connector that wrote the downloaded files to a file system and put
> the path to each file on an ActiveMQ JMS queue, then the crawl showed
> quirky behavior. A few times the crawls stopped in their tracks and then
> after 40 - 60 minutes a message was printed to the logfile saying that the
> SQL queries took too long. The full dump of one set of these messages is
> below, at the end of this email. The web crawls always recover, and they
> are still running. I am using postgres 9.3.2 with manifoldcf, and so far it
> has not had many issues, except for the occasional SQL taking too long
> message, although these are infrequent. Do I need to use a different
> version of postgres? Or make some other change?
>
> Thank you for you help.
>
> Tom Rees
> Chiliad
>
> WARN 2014-05-21 11:05:08,230 (Worker thread '28') - Found a long-running
> query (2662579 ms): [UPDATE hopcount SET deathmark=?,distance=? WHERE id
> IN(SELECT ownerid FROM hopdeleted
> eps t0 WHERE t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM
> intrinsiclink t1 WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
> t1.parentidhash=t0.parentidhash AND
>  t1.childidhash=t0.childidhash AND t1.isnew=?))]
>  WARN 2014-05-21 11:05:08,230 (Worker thread '28') -   Parameter 0: 'D'
>  WARN 2014-05-21 11:05:08,230 (Worker thread '28') -   Parameter 1: '-1'
>  WARN 2014-05-21 11:05:08,230 (Worker thread '28') -   Parameter 2:
> '1400623413113'
>  WARN 2014-05-21 11:05:08,231 (Worker thread '28') -   Parameter 3:
> 'A2EB225081B47722CCAEB3293A28EEB2F264E02C'
>  WARN 2014-05-21 11:05:08,231 (Worker thread '28') -   Parameter 4: 'B'
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') - Found a long-running
> query (2625296 ms): [UPDATE hopcount SET deathmark=?,distance=? WHERE id
> IN(SELECT ownerid FROM hopdeletede
> ps t0 WHERE t0.jobid=? AND t0.childidhash=? AND EXISTS(SELECT 'x' FROM
> intrinsiclink t1 WHERE t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
> t1.parentidhash=t0.parentidhash AND
> t1.childidhash=t0.childidhash AND t1.isnew=?))]
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') -   Parameter 0: 'D'
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') -   Parameter 1: '-1'
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') -   Parameter 2:
> '1400623413113'
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') -   Parameter 3:
> 'D942516DE5623A6417FCB994186B507E8CDA30D6'
>  WARN 2014-05-21 11:05:08,243 (Worker thread '4') -   Parameter 4: 'B'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') - Found a long-running
> query (2675765 ms): [SELECT parentidhash FROM intrinsiclink WHERE jobid=?
> AND parentidhash IN (?,?,?,?,?,?
> ,?,?,?,?,?,?,?,?,?,?,?,?,?,?) AND linktype=? AND childidhash=? FOR UPDATE]
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 0:
> '1400623413113'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 1:
> '054FC31ACF6FB96D2F8D19FF9CC230349E6A7A76'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 2:
> '0774E538282FCA04F0FF95AC65D48EFC57CC6225'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 3:
> '1027C9AF07AE2B419C31A1D3B20352E31867BBBB'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 4:
> '1382DE9902A7CCC0012F043077E1739867CE00A4'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 5:
> '2E8844A26FCD3096DF0D6BC3BB3D6648FCBCA7FA'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 6:
> '34741F8B2706BCB202FDA72DABB94D916D497CD4'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 7:
> '6A5E47B467A29A8614B473856F1D28EC8B30F5F3'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 8:
> '71B865B0979B351279EFD9F99CA8AF700704400A'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 9:
> '77C6E57EBDD811027F776BF895E0B43275AF3628'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 10:
> '8267055C5CE6D7A1917F88B1FA310FC5082FD599'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 11:
> '8F361A3EDA0CAC989812623441DA02BD42883C4F'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 12:
> '956CCECF3FD5F508624E19270FD5EC28532B0922'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 13:
> '9BAA3731F101B3908E4FFF4A5325601C57B4CD57'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 14:
> 'AD628D16A2708EECD1C33AA0E63D849BCB5DF417'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 15:
> 'B661E6DD08FD89A6643A706ECAB6E1729FC623C8'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 16:
> 'D1F182BF5B49CB4FBF274A1B63B54C2F684EC059'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 17:
> 'D7FB0CB3AFE34BC258686368296AF0D896C5786E'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 18:
> 'D807BE55355A53CA84B4163F42081A896B323A81'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 19:
> 'EDED88E796389DEB5E8DA14F1FD56088CDA8BF98'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 20:
> 'FE4A24472BD3648F839FFAB7B5476915504A9755'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 21: 'link'
>  WARN 2014-05-21 11:05:08,252 (Worker thread '40') -   Parameter 22:
> 'B661E6DD08FD89A6643A706ECAB6E1729FC623C8'
>  WARN 2014-05-21 11:05:08,289 (Worker thread '4') -  Plan: Update on
> hopcount  (cost=157.53..165.57 rows=1 width=81)
>  WARN 2014-05-21 11:05:08,289 (Worker thread '28') -  Plan: Update on
> hopcount  (cost=157.53..165.57 rows=1 width=81)
>  WARN 2014-05-21 11:05:08,289 (Worker thread '28') -  Plan:   ->  Nested
> Loop  (cost=157.53..165.57 rows=1 width=81)
>  WARN 2014-05-21 11:05:08,289 (Worker thread '4') -  Plan:   ->  Nested
> Loop  (cost=157.53..165.57 rows=1 width=81)
>  WARN 2014-05-21 11:05:08,289 (Worker thread '28') -  Plan:         ->
>  HashAggregate  (cost=157.11..157.12 rows=1 width=20)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
> ->  Hash Join  (cost=101.51..157.11 rows=1 width=20)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:         ->
>  HashAggregate  (cost=157.11..157.12 rows=1 width=20)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>       Hash Cond: (((t0.linktype)::text = (t1.linktype)::text) AND
> ((t0.parentidhash)::text = (t1.parentidhash)::text))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
> ->  Hash Join  (cost=101.51..157.11 rows=1 width=20)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>     Hash Cond: (((t0.linktype)::text = (t1.linktype)::text) AND
> ((t0.parentidhash)::text = (t1.parentidhash)::text))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>     ->  Index Scan using i1400371486543 on hopdeletedeps t0
>  (cost=0.56..55.95 rows=27 width=109)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>       ->  Index Scan using i1400371486543 on hopdeletedeps t0
>  (cost=0.56..55.95 rows=27 width=109)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>             Index Cond: ((jobid = 1400623413113::bigint) AND
> ((childidhash)::text = 'A2EB225081B47722CCAEB3293A28EEB2F264E02C'::text))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>           Index Cond: ((jobid = 1400623413113::bigint) AND
> ((childidhash)::text = 'D942516DE5623A6417FCB994186B507E8CDA30D6'::text))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>       ->  Hash  (cost=100.32..100.32 rows=42 width=101)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>     ->  Hash  (cost=100.32..100.32 rows=42 width=101)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>             ->  Index Scan using i1400371486547 on intrinsiclink t1
>  (cost=0.56..100.32 rows=42 width=101)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
>                   Index Cond: ((jobid = 1400623413113::bigint) AND
> ((childidhash)::text = 'A2EB225081B47722CCAEB3293A28EEB2F264E02C'::text)
> AND (isnew = 'B'::bpchar))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>           ->  Index Scan using i1400371486547 on intrinsiclink t1
>  (cost=0.56..100.32 rows=42 width=101)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:         ->
>  Index Scan using hopcount_pkey on hopcount  (cost=0.42..8.45 rows=1
> width=69)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
>                 Index Cond: ((jobid = 1400623413113::bigint) AND
> ((childidhash)::text = 'D942516DE5623A6417FCB994186B507E8CDA30D6'::text)
> AND (isnew = 'B'::bpchar))
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:         ->
>  Index Scan using hopcount_pkey on hopcount  (cost=0.42..8.45 rows=1
> width=69)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -  Plan:
> Index Cond: (id = t0.ownerid)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '28') -
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -  Plan:
> Index Cond: (id = t0.ownerid)
>  WARN 2014-05-21 11:05:08,290 (Worker thread '4') -
>  WARN 2014-05-21 11:05:08,294 (Worker thread '40') -  Plan: LockRows
>  (cost=0.56..101.40 rows=3 width=47) (actual time=0.041..0.041 rows=0
> loops=1)
>
>

Mime
View raw message