manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoko Uchida <tomoko.uchida.1...@gmail.com>
Subject Re: SQLException "value too long for type character varying(64)" while deleting documents
Date Tue, 14 Jun 2016 04:18:32 GMT
I've run and delete the job in question on ManifloldCF 2.4 and current
trunk (2.5-dev with CONNECTORS-1323.patch.)
Our problem can be reproduced with 2.4 and seems to be resolved with
trunk version.

Operation:

1. Create a job with eight outputs below:
- ds_solr_forum_en-eu
- ds_solr_forum_en-in
- ds_solr_forum_en-sg
- ds_solr_forum_en-us
- ds_solr_forum_ko-kr_en
- ds_solr_forum_zh-cn_en
- ds_solr_forum_zh-tw_en
- ds_solr_forum_pt-br_en

2. Run the job for a while.

3. Abort the job.

4. Delete the job.

With ManifoldCF 2.4, SQLException and stack traces (below) was logged
and the job remained in "clean up" status.

ERROR 2016-06-14 09:33:19,714 (Document delete thread '0') - Document
delete thread aborting and restarting due to database connection
reset: Database exception: SQLException doing query (22001): ERROR:
value too long for type character varying(64)
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
exception: SQLException doing query (22001): ERROR: value too long for
type character varying(64)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:715)
at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:741)
at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:803)
at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:661)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performInsert(DBInterfacePostgreSQL.java:187)
at org.apache.manifoldcf.core.database.BaseTable.performInsert(BaseTable.java:68)
at org.apache.manifoldcf.crawler.repository.RepositoryHistoryManager.addRow(RepositoryHistoryManager.java:203)
at org.apache.manifoldcf.crawler.repository.RepositoryConnectionManager.recordHistory(RepositoryConnectionManager.java:706)
at org.apache.manifoldcf.crawler.system.DocumentDeleteThread$OutputRemoveActivity.recordActivity(DocumentDeleteThread.java:295)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$OutputRecordingActivity.recordActivity(IncrementalIngester.java:2383)
at org.apache.manifoldcf.agents.output.solr.HttpPoster.deletePost(HttpPoster.java:720)
at org.apache.manifoldcf.agents.output.solr.SolrConnector.removeDocument(SolrConnector.java:605)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.removeDocument(IncrementalIngester.java:2306)
at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentDeleteMultiple(IncrementalIngester.java:1042)

With 2.5-dev, there were no errors and the job was completely removed.

Thank you for the fix.

So I want to apply same fix (CONNECTORS-1323) to ManifoldCF 2.2
because our production system cannot be upgraded to the latest version
immediately, though we should plan to do so.
I'll try it.

Best regards,
Tomoko

2016-06-13 18:34 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> And some additional information are here.
>
> I use ManifoldCF 2.2.
>
>> (1) Which underlying database are you using?
>
> I use PostgreSQL 9.4.5
>
>> (2) Have you modified the MCF schema in any way?
>
> No. I did not modify any MCF db schema.
>
>> (3) What are the actual names of the output connections in question?
>
> For example, a job has 8 outputs below. There are other jobs that
> cannot be deleted by same reason.
> - ds_solr_forum_en-eu
> - ds_solr_forum_en-in
> - ds_solr_forum_en-sg
> - ds_solr_forum_en-us
> - ds_solr_forum_ko-kr_en
> - ds_solr_forum_zh-cn_en
> - ds_solr_forum_zh-tw_en
> - ds_solr_forum_pt-br_en
>
> For business requirements, I crawl a web site and post to multiple
> (eight) solr cores.
>
> Whole job definition is below (I deleted seeds/includes/excludes URLs
> from the original json data):
>
> {
>     "job": {
>         "description": "ds_forum_en",
>         "document_specification": {
>             "excludes": “…”,
>             "excludescontentindex": "",
>             "excludesindex": "",
>             "includes": “…”,
>             "includesindex": ".*",
>             "limittoseeds": {
>                 "_attribute_value": "true",
>                 "_value_": ""
>             },
>             "seeds": “…”
>         },
>         "expiration_interval": "infinite",
>         "hopcount_mode": "accurate",
>         "id": "1464673266530",
>         "pipelinestage": [
>             {
>                 "stage_connectionname": "ds_solr_forum_en-eu",
>                 "stage_id": "0",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_en-in",
>                 "stage_id": "1",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_en-sg",
>                 "stage_id": "2",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_en-us",
>                 "stage_id": "3",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_ko-kr_en",
>                 "stage_id": "4",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_zh-cn_en",
>                 "stage_id": "5",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_zh-tw_en",
>                 "stage_id": "6",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             },
>             {
>                 "stage_connectionname": "ds_solr_forum_pt-br_en",
>                 "stage_id": "7",
>                 "stage_isoutput": "true",
>                 "stage_specification": {}
>             }
>         ],
>         "priority": "5",
>         "recrawl_interval": "86400000",
>         "repository_connection": "ds_forum_en",
>         "reseed_interval": "3600000",
>         "run_mode": "continuous",
>         "start_mode": "manual"
>     }
> }
>
> Thank you,
> Tomoko
>
> 2016-06-13 18:09 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>> Hi Karl,
>>
>> Thank you for rapid response! I'll try the patch soon.
>>
>> Regards,
>> Tomoko
>>
>> 2016-06-13 16:20 GMT+09:00 Karl Wright <daddywri@gmail.com>:
>>> Ok, some further exploration yields the following:
>>> (1) A check was put into the code a while ago to prevent overly long
>>> activity names from blowing things up.  That is why we no longer see this
>>> problem.
>>> (2) There was a problem with activity logging for deletions across multiple
>>> output connections.  See CONNECTORS-1323.  I've provided a patch.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Tomoko,
>>>>
>>>> Sorry, I missed this post when it was originally made.
>>>>
>>>> The activitytype column is provided by the framework for only a small
>>>> number of specific events.  In no case does the activitytype contain
>>>> anything other than a fixed-length string; it's meant to be queried on.
>>>> That string may include the name of a single output connection or of a
>>>> transformation connection, but only one.  The maximum length of an output
>>>> or transformation connection name is 32, so the total length available for
>>>> the rest of the activitytype column is 30.
>>>>
>>>> The string "document deletion" is 17 characters, so that's nowhere near
>>>> the limit here. So this makes no sense.
>>>>
>>>> Can you be more specific about the following:
>>>>
>>>> (1) Which underlying database are you using?
>>>> (2) Have you modified the MCF schema in any way?
>>>> (3) What are the actual names of the output connections in question?
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida <
>>>> tomoko.uchida.1111@gmail.com> wrote:
>>>>
>>>>> Hi, any suggestions?
>>>>>
>>>>> Is this a known limitation, or
>>>>> should I create a ticket about that?
>>>>>
>>>>> Thanks,
>>>>> Tomoko
>>>>>
>>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>>> > Hello developers,
>>>>> >
>>>>> > I have sent same message to the user mailing list but there are
no
>>>>> > reply. Could anyone help me?
>>>>> > Some jobs in our customer production environment no longer cannot
be
>>>>> > deleted for this problem.
>>>>> >
>>>>> > We are looking for solutions to delete the jobs safely.
>>>>> > If my question was not clear, I am ready to provide more detailed
>>>>> explanation.
>>>>> >
>>>>> > ----
>>>>> >
>>>>> > Hello,
>>>>> > I encountered an SQLException when I deleted a job with many output
>>>>> connections.
>>>>> >
>>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9') - Document
>>>>> > delete thread aborting and restarting due to database connection
>>>>> > reset: Database exception: SQLException doing query (22001): ERROR:
>>>>> > value too long for type character varying(64)
>>>>> >
>>>>> >
>>>>> > I've found that the error occurred because of ManifoldCF trying
to
>>>>> > insert long string (more than 64 characters) to 'activitytype' column
>>>>> > of 'repohistory' table while deleting documents associated with
the
>>>>> > job.
>>>>> >
>>>>> > For a trial, I altered 'activitytype' column type to 'text' by this
>>>>> > sentence.
>>>>> >
>>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text;
>>>>> >
>>>>> > After altering the table I restarted ManifoldCF then the deletion
>>>>> > histories was successfully added and the job seemed to be safely
>>>>> > deleted.
>>>>> >
>>>>> > Inserted 'activitytype' values are like this:
>>>>> > document deletion (outputA)  (outputB)  (outputC) (outputD) (outputE)
>>>>> ...
>>>>> >
>>>>> > For application requirements, I cannot limit the number of output
>>>>> > connectors (to shorten history records.)
>>>>> >
>>>>> > Is that OK? Or there are good solutions for that?
>>>>> >
>>>>> > Thank you in advance,
>>>>> > Tomoko
>>>>>
>>>>
>>>>

Mime
View raw message