manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoko Uchida <tomoko.uchida.1...@gmail.com>
Subject Re: SQLException "value too long for type character varying(64)" while deleting documents
Date Mon, 13 Jun 2016 09:34:09 GMT
And some additional information are here.

I use ManifoldCF 2.2.

> (1) Which underlying database are you using?

I use PostgreSQL 9.4.5

> (2) Have you modified the MCF schema in any way?

No. I did not modify any MCF db schema.

> (3) What are the actual names of the output connections in question?

For example, a job has 8 outputs below. There are other jobs that
cannot be deleted by same reason.
- ds_solr_forum_en-eu
- ds_solr_forum_en-in
- ds_solr_forum_en-sg
- ds_solr_forum_en-us
- ds_solr_forum_ko-kr_en
- ds_solr_forum_zh-cn_en
- ds_solr_forum_zh-tw_en
- ds_solr_forum_pt-br_en

For business requirements, I crawl a web site and post to multiple
(eight) solr cores.

Whole job definition is below (I deleted seeds/includes/excludes URLs
from the original json data):

{
    "job": {
        "description": "ds_forum_en",
        "document_specification": {
            "excludes": “…”,
            "excludescontentindex": "",
            "excludesindex": "",
            "includes": “…”,
            "includesindex": ".*",
            "limittoseeds": {
                "_attribute_value": "true",
                "_value_": ""
            },
            "seeds": “…”
        },
        "expiration_interval": "infinite",
        "hopcount_mode": "accurate",
        "id": "1464673266530",
        "pipelinestage": [
            {
                "stage_connectionname": "ds_solr_forum_en-eu",
                "stage_id": "0",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_en-in",
                "stage_id": "1",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_en-sg",
                "stage_id": "2",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_en-us",
                "stage_id": "3",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_ko-kr_en",
                "stage_id": "4",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_zh-cn_en",
                "stage_id": "5",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_zh-tw_en",
                "stage_id": "6",
                "stage_isoutput": "true",
                "stage_specification": {}
            },
            {
                "stage_connectionname": "ds_solr_forum_pt-br_en",
                "stage_id": "7",
                "stage_isoutput": "true",
                "stage_specification": {}
            }
        ],
        "priority": "5",
        "recrawl_interval": "86400000",
        "repository_connection": "ds_forum_en",
        "reseed_interval": "3600000",
        "run_mode": "continuous",
        "start_mode": "manual"
    }
}

Thank you,
Tomoko

2016-06-13 18:09 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
> Hi Karl,
>
> Thank you for rapid response! I'll try the patch soon.
>
> Regards,
> Tomoko
>
> 2016-06-13 16:20 GMT+09:00 Karl Wright <daddywri@gmail.com>:
>> Ok, some further exploration yields the following:
>> (1) A check was put into the code a while ago to prevent overly long
>> activity names from blowing things up.  That is why we no longer see this
>> problem.
>> (2) There was a problem with activity logging for deletions across multiple
>> output connections.  See CONNECTORS-1323.  I've provided a patch.
>>
>> Karl
>>
>>
>> On Mon, Jun 13, 2016 at 1:55 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Tomoko,
>>>
>>> Sorry, I missed this post when it was originally made.
>>>
>>> The activitytype column is provided by the framework for only a small
>>> number of specific events.  In no case does the activitytype contain
>>> anything other than a fixed-length string; it's meant to be queried on.
>>> That string may include the name of a single output connection or of a
>>> transformation connection, but only one.  The maximum length of an output
>>> or transformation connection name is 32, so the total length available for
>>> the rest of the activitytype column is 30.
>>>
>>> The string "document deletion" is 17 characters, so that's nowhere near
>>> the limit here. So this makes no sense.
>>>
>>> Can you be more specific about the following:
>>>
>>> (1) Which underlying database are you using?
>>> (2) Have you modified the MCF schema in any way?
>>> (3) What are the actual names of the output connections in question?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>> On Sun, Jun 12, 2016 at 10:42 PM, Tomoko Uchida <
>>> tomoko.uchida.1111@gmail.com> wrote:
>>>
>>>> Hi, any suggestions?
>>>>
>>>> Is this a known limitation, or
>>>> should I create a ticket about that?
>>>>
>>>> Thanks,
>>>> Tomoko
>>>>
>>>> 2016-06-09 10:44 GMT+09:00 Tomoko Uchida <tomoko.uchida.1111@gmail.com>:
>>>> > Hello developers,
>>>> >
>>>> > I have sent same message to the user mailing list but there are no
>>>> > reply. Could anyone help me?
>>>> > Some jobs in our customer production environment no longer cannot be
>>>> > deleted for this problem.
>>>> >
>>>> > We are looking for solutions to delete the jobs safely.
>>>> > If my question was not clear, I am ready to provide more detailed
>>>> explanation.
>>>> >
>>>> > ----
>>>> >
>>>> > Hello,
>>>> > I encountered an SQLException when I deleted a job with many output
>>>> connections.
>>>> >
>>>> > ERROR 2016-06-02 09:41:49,492 (Document delete thread '9') - Document
>>>> > delete thread aborting and restarting due to database connection
>>>> > reset: Database exception: SQLException doing query (22001): ERROR:
>>>> > value too long for type character varying(64)
>>>> >
>>>> >
>>>> > I've found that the error occurred because of ManifoldCF trying to
>>>> > insert long string (more than 64 characters) to 'activitytype' column
>>>> > of 'repohistory' table while deleting documents associated with the
>>>> > job.
>>>> >
>>>> > For a trial, I altered 'activitytype' column type to 'text' by this
>>>> > sentence.
>>>> >
>>>> > ALTER TABLE repohistory ALTER COLUMN activitytype TYPE text;
>>>> >
>>>> > After altering the table I restarted ManifoldCF then the deletion
>>>> > histories was successfully added and the job seemed to be safely
>>>> > deleted.
>>>> >
>>>> > Inserted 'activitytype' values are like this:
>>>> > document deletion (outputA)  (outputB)  (outputC) (outputD) (outputE)
>>>> ...
>>>> >
>>>> > For application requirements, I cannot limit the number of output
>>>> > connectors (to shorten history records.)
>>>> >
>>>> > Is that OK? Or there are good solutions for that?
>>>> >
>>>> > Thank you in advance,
>>>> > Tomoko
>>>>
>>>
>>>

Mime
View raw message