manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: SQLException "value too long for type character varying(64)" while deleting documents
Date Tue, 14 Jun 2016 12:09:27 GMT
I have no reason to believe that this patch won't work for MCF 2.2 as
well.  Please let me know of any problems and I will issue a revised patch
against that branch.

Thanks,
Karl


On Tue, Jun 14, 2016 at 12:18 AM, Tomoko Uchida <
tomoko.uchida.1111@gmail.com> wrote:

> I've run and delete the job in question on ManifloldCF 2.4 and current
> trunk (2.5-dev with CONNECTORS-1323.patch.)
> Our problem can be reproduced with 2.4 and seems to be resolved with
> trunk version.
>
> Operation:
>
> 1. Create a job with eight outputs below:
> - ds_solr_forum_en-eu
> - ds_solr_forum_en-in
> - ds_solr_forum_en-sg
> - ds_solr_forum_en-us
> - ds_solr_forum_ko-kr_en
> - ds_solr_forum_zh-cn_en
> - ds_solr_forum_zh-tw_en
> - ds_solr_forum_pt-br_en
>
> 2. Run the job for a while.
>
> 3. Abort the job.
>
> 4. Delete the job.
>
> With ManifoldCF 2.4, SQLException and stack traces (below) was logged
> and the job remained in "clean up" status.
>
> ERROR 2016-06-14 09:33:19,714 (Document delete thread '0') - Document
> delete thread aborting and restarting due to database connection
> reset: Database exception: SQLException doing query (22001): ERROR:
> value too long for type character varying(64)
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
> exception: SQLException doing query (22001): ERROR: value too long for
> type character varying(64)
> at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715)
> at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741)
> at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:803)
> at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
> at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:661)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187)
> at
> org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
> at
> org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:203)
> at
> org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706)
> at
> org.apache.manifoldcf.crawler.system.DocumentDeleteThread$OutputRemoveActivity.recordActivity(DocumentDeleteThread.java:295)
> at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383)
> at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383)
> at
> org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:720)
> at
> org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:605)
> at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2306)
> at
> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1042)
>
> With 2.5-dev, there were no errors and the job was completely removed.
>
> Thank you for the fix.
>
> So I want to apply same fix (CONNECTORS-1323) to ManifoldCF 2.2
> because our production system cannot be upgraded to the latest version
> immediately, though we should plan to do so.
> I'll try it.
>
> Best regards,
> Tomoko
>
> 2016-06-13 18:34 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> > And some additional information are here.
> >
> > I use ManifoldCF 2.2.
> >
> >> (1) Which underlying database are you using?
> >
> > I use PostgreSQL 9.4.5
> >
> >> (2) Have you modified the MCF schema in any way?
> >
> > No. I did not modify any MCF db schema.
> >
> >> (3) What are the actual names of the output connections in question?
> >
> > For example, a job has 8 outputs below. There are other jobs that
> > cannot be deleted by same reason.
> > - ds_solr_forum_en-eu
> > - ds_solr_forum_en-in
> > - ds_solr_forum_en-sg
> > - ds_solr_forum_en-us
> > - ds_solr_forum_ko-kr_en
> > - ds_solr_forum_zh-cn_en
> > - ds_solr_forum_zh-tw_en
> > - ds_solr_forum_pt-br_en
> >
> > For business requirements, I crawl a web site and post to multiple
> > (eight) solr cores.
> >
> > Whole job definition is below (I deleted seeds/includes/excludes URLs
> > from the original json data):
> >
> > {
> >     "job": {
> >         "description": "ds_forum_en",
> >         "document_specification": {
> >             "excludes": “…”,
> >             "excludescontentindex": "",
> >             "excludesindex": "",
> >             "includes": “…”,
> >             "includesindex": ".*",
> >             "limittoseeds": {
> >                 "_attribute_value": "true",
> >                 "_value_": ""
> >             },
> >             "seeds": “…”
> >         },
> >         "expiration_interval": "infinite",
> >         "hopcount_mode": "accurate",
> >         "id": "1464673266530",
> >         "pipelinestage": [
> >             {
> >                 "stage_connectionname": "ds_solr_forum_en-eu",
> >                 "stage_id": "0",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_en-in",
> >                 "stage_id": "1",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_en-sg",
> >                 "stage_id": "2",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_en-us",
> >                 "stage_id": "3",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_ko-kr_en",
> >                 "stage_id": "4",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_zh-cn_en",
> >                 "stage_id": "5",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_zh-tw_en",
> >                 "stage_id": "6",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             },
> >             {
> >                 "stage_connectionname": "ds_solr_forum_pt-br_en",
> >                 "stage_id": "7",
> >                 "stage_isoutput": "true",
> >                 "stage_specification": {}
> >             }
> >         ],
> >         "priority": "5",
> >         "recrawl_interval": "86400000",
> >         "repository_connection": "ds_forum_en",
> >         "reseed_interval": "3600000",
> >         "run_mode": "continuous",
> >         "start_mode": "manual"
> >     }
> > }
> >
> > Thank you,
> > Tomoko
> >
> > 2016-06-13 18:09 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> >> Hi Karl,
> >>
> >> Thank you for rapid response! I'll try the patch soon.
> >>
> >> Regards,
> >> Tomoko
> >>
> >> 2016-06-13 16:20 GMT+09:00 Karl Wright <daddywri@gmail.com>:
> >>> Ok, some further exploration yields the following:
> >>> (1) A check was put into the code a while ago to prevent overly long
> >>> activity names from blowing things up.  That is why we no longer see
> this
> >>> problem.
> >>> (2) There was a problem with activity logging for deletions across
> multiple
> >>> output connections.  See CONNECTORS-1323.  I've provided a patch.
> >>>
> >>> Karl
> >>>
> >>>
> >>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <daddywri@gmail.com>
> wrote:
> >>>
> >>>> Hi Tomoko,
> >>>>
> >>>> Sorry, I missed this post when it was originally made.
> >>>>
> >>>> The activitytype column is provided by the framework for only a small
> >>>> number of specific events.  In no case does the activitytype contain
> >>>> anything other than a fixed-length string; it's meant to be queried
> on.
> >>>> That string may include the name of a single output connection or of
a
> >>>> transformation connection, but only one.  The maximum length of an
> output
> >>>> or transformation connection name is 32, so the total length
> available for
> >>>> the rest of the activitytype column is 30.
> >>>>
> >>>> The string "document deletion" is 17 characters, so that's nowhere
> near
> >>>> the limit here. So this makes no sense.
> >>>>
> >>>> Can you be more specific about the following:
> >>>>
> >>>> (1) Which underlying database are you using?
> >>>> (2) Have you modified the MCF schema in any way?
> >>>> (3) What are the actual names of the output connections in question?
> >>>>
> >>>> Thanks,
> >>>> Karl
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida <
> >>>> tomoko.uchida.1111@gmail.com> wrote:
> >>>>
> >>>>> Hi, any suggestions?
> >>>>>
> >>>>> Is this a known limitation, or
> >>>>> should I create a ticket about that?
> >>>>>
> >>>>> Thanks,
> >>>>> Tomoko
> >>>>>
> >>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida <
> tomoko.uchida.1111@gmail.com>:
> >>>>> > Hello developers,
> >>>>> >
> >>>>> > I have sent same message to the user mailing list but there
are no
> >>>>> > reply. Could anyone help me?
> >>>>> > Some jobs in our customer production environment no longer
cannot
> be
> >>>>> > deleted for this problem.
> >>>>> >
> >>>>> > We are looking for solutions to delete the jobs safely.
> >>>>> > If my question was not clear, I am ready to provide more detailed
> >>>>> explanation.
> >>>>> >
> >>>>> > ----
> >>>>> >
> >>>>> > Hello,
> >>>>> > I encountered an SQLException when I deleted a job with many
output
> >>>>> connections.
> >>>>> >
> >>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9')
-
> Document
> >>>>> > delete thread aborting and restarting due to database connection
> >>>>> > reset: Database exception: SQLException doing query (22001):
ERROR:
> >>>>> > value too long for type character varying(64)
> >>>>> >
> >>>>> >
> >>>>> > I've found that the error occurred because of ManifoldCF trying
to
> >>>>> > insert long string (more than 64 characters) to 'activitytype'
> column
> >>>>> > of 'repohistory' table while deleting documents associated
with the
> >>>>> > job.
> >>>>> >
> >>>>> > For a trial, I altered 'activitytype' column type to 'text'
by this
> >>>>> > sentence.
> >>>>> >
> >>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text;
> >>>>> >
> >>>>> > After altering the table I restarted ManifoldCF then the deletion
> >>>>> > histories was successfully added and the job seemed to be safely
> >>>>> > deleted.
> >>>>> >
> >>>>> > Inserted 'activitytype' values are like this:
> >>>>> > document deletion (outputA)  (outputB)  (outputC) (outputD)
> (outputE)
> >>>>> ...
> >>>>> >
> >>>>> > For application requirements, I cannot limit the number of
output
> >>>>> > connectors (to shorten history records.)
> >>>>> >
> >>>>> > Is that OK? Or there are good solutions for that?
> >>>>> >
> >>>>> > Thank you in advance,
> >>>>> > Tomoko
> >>>>>
> >>>>
> >>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message