manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1579) Error when crawling a MSSQL table
Date Fri, 08 Feb 2019 08:11:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763395#comment-16763395
] 

Karl Wright commented on CONNECTORS-1579:
-----------------------------------------

You can either check out the entire current trunk source code and build that, or download
the release source and libs, apply the patch, and build that.  Which do you want to do?


> Error when crawling a MSSQL table
> ---------------------------------
>
>                 Key: CONNECTORS-1579
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: JDBC connector
>    Affects Versions: ManifoldCF 2.12
>            Reporter: Donald Van den Driessche
>            Assignee: Karl Wright
>            Priority: Major
>             Fix For: ManifoldCF 2.13
>
>         Attachments: 636_bb2.csv, CONNECTORS-1579.patch
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following error on multiple
lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple document
primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component dispositions not
allowed: document '636'
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
~[mcf-pull-agent.jar:?]
> at org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
~[?:?]
> at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have something to do
with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected fields
in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps on retrying
this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message