manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend Garåsen <e.f.gara...@usit.uio.no>
Subject Re: Timeout problems with web crawling
Date Thu, 25 Apr 2013 12:56:30 GMT

After I did an svn up and started a new crawl, I'm still getting a lot 
of these. I will analyze futher by running EXPLAIN ANALYZE. The job 
stops for some minutes and then continues.

WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Found a long-running 
query (62339 ms): [SELECT 
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset 
FROM jobqueue t0  WHERE t0.status IN (?,?) AND t0.checkaction=? AND 
t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN 
(?,?) AND t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x' 
FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN 
(?,?,?,?,?,?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM 
prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name) 
ORDER BY t0.docpriority ASC LIMIT 100]
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 0: 'P'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 1: 'G'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 2: 'R'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 3: 
'1366892554745'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 4: 'A'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 5: 'a'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 6: '5'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 7: 'A'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 8: 'F'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 9: 'a'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 10: 'f'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 11: 'D'
  WARN 2013-04-25 14:23:37,175 (Stuffer thread) -   Parameter 12: 'd'
  WARN 2013-04-25 14:32:33,766 (Stuffer thread) -  Plan: Limit 
(cost=9603.72..9603.72 rows=1 width=155) (actual 
time=536587.381..536587.586 rows=100 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan:   ->  Sort 
(cost=9603.72..9603.72 rows=1 width=155) (actual 
time=536587.379..536587.449 rows=100 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan:         Sort 
Key: t0.docpriority
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan:         Sort 
Method: top-N heapsort  Memory: 51kB
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan:         -> 
Nested Loop Anti Join  (cost=17.68..9603.71 rows=1 width=155) (actual 
time=1.565..536578.061 rows=1592 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
Join Filter: (t2.jobid <> t0.jobid)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
->  Nested Loop Semi Join  (cost=17.68..9595.09 rows=1 width=155) 
(actual time=1.551..536533.696 rows=1592 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
      Join Filter: (t0.jobid = t1.id)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
      ->  Nested Loop Anti Join  (cost=17.68..9593.04 rows=1 width=155) 
(actual time=1.541..536504.762 rows=1592 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
            Join Filter: (t0.id = t3.owner)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
            ->  Bitmap Heap Scan on jobqueue t0  (cost=4.53..6.54 rows=1 
width=155) (actual time=1.516..8.783 rows=1592 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  Recheck Cond: (status = ANY ('{P,G}'::bpchar[]))
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  Filter: ((checktime <= 1366892554745::bigint) AND 
(checkaction = 'R'::bpchar))
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  ->  Bitmap Index Scan on i1362584563122 
(cost=0.00..4.53 rows=1 width=0) (actual time=1.375..1.375 rows=8931 
loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                        Index Cond: (status = ANY ('{P,G}'::bpchar[]))
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
            ->  Hash Join  (cost=13.15..7637.41 rows=231632 width=8) 
(actual time=336.989..336.989 rows=0 loops=1592)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  Hash Cond: ((t3.eventname)::text = (t4.name)::text)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  ->  Seq Scan on prereqevents t3  (cost=0.00..4439.32 
rows=231632 width=37) (actual time=0.006..160.286 rows=231491 loops=1592)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                  ->  Hash  (cost=11.40..11.40 rows=140 width=516) 
(actual time=0.005..0.005 rows=0 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                        Buckets: 1024  Batches: 1  Memory Usage: 0kB
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
                        ->  Seq Scan on events t4  (cost=0.00..11.40 
rows=140 width=516) (actual time=0.003..0.003 rows=0 loops=1)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
      ->  Seq Scan on jobs t1  (cost=0.00..2.03 rows=2 width=8) (actual 
time=0.008..0.008 rows=1 loops=1592)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
            Filter: ((status = ANY ('{A,a}'::bpchar[])) AND (priority = 
5::bigint))
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
->  Index Scan using i1362584563120 on jobqueue t2  (cost=0.00..4.30 
rows=1 width=49) (actual time=0.020..0.020 rows=0 loops=1592)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
      Index Cond: ((dochash)::text = (t0.dochash)::text)
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: 
      Filter: (status = ANY ('{A,F,a,f,D,d}'::bpchar[]))
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -  Plan: Total runtime: 
536587.771 ms
  WARN 2013-04-25 14:32:33,767 (Stuffer thread) -


On 24.04.13 11.29, Karl Wright wrote:
> I updated the ticket as well; all the EXPLAIN queries are very fast at
> the moment.  The reason is that the checktime value you used needs to be
> the current time or larger, which is why I picked
>
> 1466628406182
>
> Karl
>
>
>
> On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen <e.f.garasen@usit.uio.no
> <mailto:e.f.garasen@usit.uio.no>> wrote:
>
>
>     Thanks Karl!
>
>     I have updated the ticket with the information you requested, but
>     there is an email delay at the moment from Jira (or this list).
>
>     Erlend
>
>
>     On 24.04.13 01.37 <tel:24.04.13%2001.37>, Karl Wright wrote:
>
>         Hi Erlend,
>
>         I had a conversation with the PostgreSQL people.  Have a look at
>         ticket
>         CONNECTORS-678; you'll want to get an EXPLAIN ANALYZE of the query I
>         provided before on the live system - but update the checktime
>         value so
>         it catches more than zero records:
>
>         EXPLAIN ANALYZE SELECT t0.id <http://t0.id>
>         <http://t0.id>,t0.jobid,t0.__dochash,t0.
>
>         docid,t0.status,t0.failtime,__t0.failcount,t0.priorityset FROM
>         jobqueue
>         t0  WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
>         t0.checktime<=1566628406182 AND
>               EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
>         ('A','a') AND
>         t1.id <http://t1.id> <http://t1.id>=t0.jobid AND t1.priority=5) AND
>
>         NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>         t2.dochash=t0.dochash AND
>         t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND
>               NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
>         t0.id <http://t0.id>
>         <http://t0.id>=t3.owner AND t3.eventname=t4.name
>         <http://t4.name> <http://t4.name>) ORDER
>
>         BY t0.docpriority ASC LIMIT 100;
>
>         I'd also like to see what the contents of your prereqevents
>         table looks
>         like on that system.  If you can do SELECT * from prereqevents;
>         , and
>         give me the first ten or so that  pop up, I can take it from there.
>
>         Karl
>
>
>
>         On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright
>         <daddywri@gmail.com <mailto:daddywri@gmail.com>
>         <mailto:daddywri@gmail.com <mailto:daddywri@gmail.com>>> wrote:
>
>              Hi Erlend,
>
>              If you have access to psql and can connect to the production
>              database, I'd like to explore whether the relatively recent
>         change I
>              made for MySQL is in fact breaking PostgreSQL stuffer
>         thread queries.
>
>              First, read this page:
>         http://www.postgresql.org/__docs/current/static/indexes-__ordering.html
>         <http://www.postgresql.org/docs/current/static/indexes-ordering.html>
>
>              What we're trying to achieve is that the planner use the
>         index whose
>              first column is docpriority.  The way we will need to
>         determine that
>              is to open up psql, and "explain" the query in question,
>         and then
>              modify the query to see if it begins to plan correctly.
>
>              In psql, you will need to assess the query in question.  To
>         do that,
>              execute the following:
>
>              EXPLAIN SELECT t0.id <http://t0.id>
>
>         <http://t0.id>,t0.jobid,t0.__dochash,t0.docid,t0.status,t0.__failtime,t0.failcount,t0.__priorityset
>
>              FROM jobqueue t0  WHERE t0.status IN ('P','G') AND
>              t0.checkaction='R' AND t0.checktime<=1366628406182 AND
>                   EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
>         ('A','a') AND
>         t1.id <http://t1.id> <http://t1.id>=t0.jobid AND t1.priority=5) AND
>
>                   NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>              t2.dochash=t0.dochash AND t2.status IN
>         ('A','F','a','f','D','d') AND
>              t2.jobid!=t0.jobid) AND
>                   NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
>         t0.id <http://t0.id> <http://t0.id>=t3.owner AND
>         t3.eventname=t4.name <http://t4.name>
>              <http://t4.name>) ORDER BY t0.docpriority ASC LIMIT 100;
>
>
>
>              The production job MUST be running or it is a meaningless
>         exercise.
>
>              What you want to see is that the outermost action is NOT a
>         sort;
>              instead, you want an
>
>              Index Scan using i1362584563121
>
>
>              If you get something else, try changing the ORDER BY clause
>         to includeNULLS LAST orNULLS FIRST to see if that
>
>
>              changes anything.  If not, I may ask you to execute a
>         sample query or two and tell me what it returns, so I can assess
>
>                whether the data is messed up in some way.
>
>
>
>              Karl
>
>
>
>
>
>
>              On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen
>              <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.no>
>         <mailto:e.f.garasen@usit.uio.__no
>         <mailto:e.f.garasen@usit.uio.no>>> wrote:
>
>                  On 23.04.13 15.30, Karl Wright wrote:
>
>                      What version of Postgresql is this?  Given the same
>                      Postgresql version,
>
>
>                  9.1.9
>
>                  "allow_system_table_mods";"____off"
>
>                  "application_name";"pgAdmin III - Query Tool"
>
>         "archive_command";"/local/opt/____pgsql-9.1/bin/pgclarchive___wal.__sh
>
>                  -P %p -F %f -S dbpg-search"
>                  "archive_mode";"on"
>                  "archive_timeout";"0"
>                  "array_nulls";"on"
>                  "authentication_timeout";"____1min"
>                  "autovacuum";"on"
>                  "autovacuum_analyze_scale_____factor";"0.1"
>                  "autovacuum_analyze_threshold"____;"50"
>                  "autovacuum_freeze_max_age";"____200000000"
>                  "autovacuum_max_workers";"3"
>                  "autovacuum_naptime";"1min"
>                  "autovacuum_vacuum_cost_delay"____;"20ms"
>                  "autovacuum_vacuum_cost_limit"____;"-1"
>                  "autovacuum_vacuum_scale_____factor";"0.2"
>                  "autovacuum_vacuum_threshold";____"50"
>                  "backslash_quote";"safe_____encoding"
>
>                  "bgwriter_delay";"200ms"
>                  "bgwriter_lru_maxpages";"100"
>                  "bgwriter_lru_multiplier";"2"
>                  "block_size";"8192"
>                  "bonjour";"off"
>                  "bonjour_name";""
>                  "bytea_output";"escape"
>                  "check_function_bodies";"on"
>                  "checkpoint_completion_target"____;"0.5"
>
>                  "checkpoint_segments";"128"
>                  "checkpoint_timeout";"5min"
>                  "checkpoint_warning";"30s"
>                  "client_encoding";"UNICODE"
>                  "client_min_messages";"notice"
>                  "commit_delay";"0"
>                  "commit_siblings";"5"
>                  "constraint_exclusion";"____partition"
>
>                  "cpu_index_tuple_cost";"0.005"
>                  "cpu_operator_cost";"0.0025"
>                  "cpu_tuple_cost";"0.01"
>                  "cursor_tuple_fraction";"0.1"
>                  "custom_variable_classes";""
>                  "DateStyle";"ISO, MDY"
>                  "db_user_namespace";"off"
>                  "deadlock_timeout";"1s"
>                  "debug_assertions";"off"
>                  "debug_pretty_print";"on"
>                  "debug_print_parse";"off"
>                  "debug_print_plan";"off"
>                  "debug_print_rewritten";"off"
>                  "default_statistics_target";"____100"
>                  "default_tablespace";""
>                  "default_text_search_config";"____pg_catalog.english"
>                  "default_transaction_____deferrable";"off"
>                  "default_transaction_____isolation";"read committed"
>                  "default_transaction_read_____only";"off"
>                  "default_with_oids";"off"
>                  "effective_cache_size";"____16093MB"
>
>                  "effective_io_concurrency";"1"
>                  "enable_bitmapscan";"on"
>                  "enable_hashagg";"on"
>                  "enable_hashjoin";"on"
>                  "enable_indexscan";"on"
>                  "enable_material";"on"
>                  "enable_mergejoin";"on"
>                  "enable_nestloop";"on"
>                  "enable_seqscan";"on"
>                  "enable_sort";"on"
>                  "enable_tidscan";"on"
>                  "escape_string_warning";"on"
>                  "exit_on_error";"off"
>                  "extra_float_digits";"0"
>                  "from_collapse_limit";"8"
>                  "fsync";"on"
>                  "full_page_writes";"on"
>                  "geqo";"on"
>                  "geqo_effort";"5"
>                  "geqo_generations";"0"
>                  "geqo_pool_size";"0"
>                  "geqo_seed";"0"
>                  "geqo_selection_bias";"2"
>                  "geqo_threshold";"12"
>                  "gin_fuzzy_search_limit";"0"
>                  "hot_standby";"off"
>                  "hot_standby_feedback";"off"
>                  "ignore_system_indexes";"off"
>                  "integer_datetimes";"on"
>                  "IntervalStyle";"postgres"
>                  "join_collapse_limit";"8"
>                  "krb_caseins_users";"off"
>                  "krb_srvname";"postgres"
>                  "lc_collate";"C"
>                  "lc_ctype";"C"
>                  "lc_messages";"C"
>                  "lc_monetary";"C"
>                  "lc_numeric";"C"
>                  "lc_time";"C"
>                  "listen_addresses";"dbpg-____search.uio.no
>         <http://dbpg-__search.uio.no>
>                  <http://dbpg-search.uio.no>"
>                  "lo_compat_privileges";"off"
>                  "local_preload_libraries";""
>                  "log_autovacuum_min_duration";____"-1"
>
>                  "log_checkpoints";"off"
>                  "log_connections";"off"
>                  "log_destination";"stderr"
>                  "log_disconnections";"off"
>                  "log_duration";"off"
>                  "log_error_verbosity";"____default"
>
>                  "log_executor_stats";"off"
>                  "log_file_mode";"0600"
>                  "log_hostname";"off"
>                  "log_line_prefix";"[%m] [dbpg-search] [%d:%u] [%r] [%p]
>         [%x] "
>                  "log_lock_waits";"off"
>                  "log_min_duration_statement";"____-1"
>                  "log_min_error_statement";"____info"
>
>                  "log_min_messages";"notice"
>                  "log_parser_stats";"off"
>                  "log_planner_stats";"off"
>                  "log_rotation_age";"0"
>                  "log_rotation_size";"0"
>                  "log_statement";"none"
>                  "log_statement_stats";"off"
>                  "log_temp_files";"-1"
>                  "log_timezone";"Europe/Oslo"
>                  "log_truncate_on_rotation";"____on"
>
>                  "logging_collector";"on"
>                  "maintenance_work_mem";"128MB"
>                  "max_connections";"600"
>                  "max_files_per_process";"1000"
>                  "max_function_args";"100"
>                  "max_identifier_length";"63"
>                  "max_index_keys";"32"
>                  "max_locks_per_transaction";"____64"
>                  "max_pred_locks_per_____transaction";"64"
>                  "max_prepared_transactions";"____0"
>                  "max_stack_depth";"4MB"
>                  "max_standby_archive_delay";"____30s"
>                  "max_standby_streaming_delay";____"30s"
>
>                  "max_wal_senders";"0"
>                  "password_encryption";"on"
>                  "port";"5432"
>                  "post_auth_delay";"0"
>                  "pre_auth_delay";"0"
>                  "quote_all_identifiers";"off"
>                  "random_page_cost";"2"
>                  "replication_timeout";"1min"
>                  "restart_after_crash";"on"
>                  "search_path";""$user",public"
>                  "segment_size";"1GB"
>                  "seq_page_cost";"1"
>                  "server_encoding";"UTF8"
>                  "server_version";"9.1.9"
>                  "server_version_num";"90109"
>                  "session_replication_role";"____origin"
>
>                  "shared_buffers";"384MB"
>                  "silent_mode";"off"
>                  "sql_inheritance";"on"
>                  "ssl";"on"
>                  "ssl_renegotiation_limit";"____512MB"
>                  "standard_conforming_strings";____"off"
>                  "statement_timeout";"0"
>                  "superuser_reserved_____connections";"3"
>
>                  "synchronize_seqscans";"on"
>                  "synchronous_commit";"on"
>                  "synchronous_standby_names";""
>                  "syslog_facility";"local0"
>                  "syslog_ident";"postgres"
>                  "tcp_keepalives_count";"9"
>                  "tcp_keepalives_idle";"7200"
>                  "tcp_keepalives_interval";"75"
>                  "temp_buffers";"8MB"
>                  "temp_tablespaces";""
>                  "TimeZone";"Europe/Oslo"
>                  "timezone_abbreviations";"____Default"
>                  "trace_notify";"off"
>                  "trace_recovery_messages";"____log"
>                  "trace_sort";"off"
>                  "track_activities";"on"
>                  "track_activity_query_size";"____1024"
>
>                  "track_counts";"on"
>                  "track_functions";"none"
>                  "transaction_deferrable";"off"
>                  "transaction_isolation";"read committed"
>                  "transaction_read_only";"off"
>                  "transform_null_equals";"off"
>                  "unix_socket_group";""
>                  "unix_socket_permissions";"____0700"
>
>                  "update_process_title";"on"
>                  "vacuum_cost_delay";"0"
>                  "vacuum_cost_limit";"200"
>                  "vacuum_cost_page_dirty";"20"
>                  "vacuum_cost_page_hit";"1"
>                  "vacuum_cost_page_miss";"10"
>                  "vacuum_defer_cleanup_age";"0"
>                  "vacuum_freeze_min_age";"____50000000"
>                  "vacuum_freeze_table_age";"____150000000"
>
>                  "wal_block_size";"8192"
>                  "wal_buffers";"12MB"
>                  "wal_keep_segments";"0"
>                  "wal_level";"archive"
>                  "wal_receiver_status_interval"____;"10s"
>
>                  "wal_segment_size";"16MB"
>                  "wal_sender_delay";"1s"
>                  "wal_sync_method";"fdatasync"
>                  "wal_writer_delay";"200ms"
>                  "work_mem";"16MB"
>                  "xmlbinary";"base64"
>                  "xmloption";"content"
>                  "zero_damaged_pages";"off"
>
>
>
>                  --
>                  Erlend Garåsen
>                  Center for Information Technology Services
>                  University of Oslo
>                  P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>                  Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>
>         <tel:%28%2B47%29%2022840193>, Fax: (+47)
>                  22852970 <tel:%28%2B47%29%2022852970>, Mobile: (+47)
>         91380968 <tel:%28%2B47%29%2091380968>
>                  <tel:%28%2B47%29%2091380968>, VIP: 31050
>
>
>
>
>
>     --
>     Erlend Garåsen
>     Center for Information Technology Services
>     University of Oslo
>     P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>     Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax: (+47) 22852970
>     <tel:%28%2B47%29%2022852970>, Mobile: (+47) 91380968
>     <tel:%28%2B47%29%2091380968>, VIP: 31050
>
>


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message