manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Document components
Date Tue, 25 Nov 2014 10:18:03 GMT
Ok, code committed.

Since you are obviously the first person to really try out the component
feature, I wondered if you would be willing to submit a test connector that
uses it, as a patch.  I apologize for not fully testing this feature in the
1.7 release -- the person who requested the feature obviously didn't
actually use it, and although I'd intended to write a test, that did not
get done.

Thanks,
Karl


On Tue, Nov 25, 2014 at 5:03 AM, Karl Wright <daddywri@gmail.com> wrote:

> I believe I've found the problem with removeDocument(), and will commit a
> fix shortly.
>
> To clarify your question about primary document disposition:
>
> For the case where you have no document, and you never expect there to be
> a document again (because, for instance, it was deleted), then
> removeDocument() is the right thing to call.  If the case is different,
> namely that the document exists but is no longer indexable for whatever
> reason, it's better to call noDocument() instead, because you can supply a
> version string, and then MCF will know not to ask you to process it again
> unless that string changes.
>
> Karl
>
>
> On Tue, Nov 25, 2014 at 4:46 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Markus,
>>
>> I've created a ticket for the exception.  CONNECTORS-1114.
>>
>> As for removal of a primary document that is not mentioned, do you mean
>> that within processDocuments(), if you don't call any disposition method
>> for a primary document, then that document is left around?  If so, that
>> behavior is intended -- it was necessary for backwards compatibility.  The
>> document should, of course, be cleaned up at the end of the job, as long as
>> you are not doing a minimal crawl.
>>
>> If you are seeing some other kind of behavior, please try to describe it
>> more completely so that I have a better idea what you mean.
>>
>> Thanks,
>> Karl
>>
>>
>> On Tue, Nov 25, 2014 at 3:25 AM, Markus Schuch <markus_schuch@web.de>
>> wrote:
>>
>>> Hi Karl,
>>>
>>> the patch for CONNECTORS-1111 fixes the cleanup issue.
>>>
>>> Another question about primary documents and their components:
>>>
>>> I have ingested a primary document with some components.
>>> During the next processing the primary document should no longer be
>>> indexed, but the sub components of it should still be indexed.
>>>
>>> My understanding is, that not mentioned components are automatically
>>> removed.
>>> Since the primary document is the "null" component, i expected the
>>> framework would remove the primary document component if not mentioned, too.
>>>
>>> But this is not the case. Is this another bug or do i have to remove the
>>> primary document somehow manually?
>>>
>>> There is an activity method removeDocument(identifier) which seems
>>> related.
>>> But i do not fully understand the described usage scenario in the
>>> method's javadoc.
>>>
>>> I tried the method. The result was the following database exception:
>>> (Patches for CONNECTORS-1110 and CONNECTORS-1111 are applied)
>>>
>>> 2014-11-25 08:30:07,868 ERROR [Worker thread '1']
>>> org.apache.manifoldcf.crawlerthreads: Worker thread aborting and restarting
>>> due to database connection reset: Database exception: SQLException doing
>>> query (HY0000): You need to set exactly 3 parameters on the prepared
>>> statement
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>>> exception: SQLException doing query (HY0000): You need to set exactly 3
>>> parameters on the prepared statement
>>>     at
>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
>>>     at
>>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
>>>     at
>>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:762)
>>>     at
>>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1435)
>>>     at
>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>>     at
>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
>>>     at
>>> org.apache.manifoldcf.core.database.DBInterfaceMySQL.performQuery(DBInterfaceMySQL.java:875)
>>>     at
>>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
>>>     at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.findRowIdsForDocIds(IncrementalIngester.java:1518)
>>>     at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemoveMultiple(IncrementalIngester.java:1377)
>>>     at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentRemove(IncrementalIngester.java:803)
>>>     at
>>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.removeDocument(WorkerThread.java:1674)
>>>     at
>>> com.example.mcf.TestConnector.processDocuments(TestConnector.java:278)
>>>     at
>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:670)
>>>     at
>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:649)
>>>     at
>>> org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector.processDocuments(BaseRepositoryConnector.java:402)
>>>     at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:380)
>>> Caused by: java.sql.SQLException: You need to set exactly 3 parameters
>>> on the prepared statement
>>>     at
>>> org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:149)
>>>     at
>>> org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106)
>>>     at
>>> org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:264)
>>>     at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:288)
>>>     at
>>> org.mariadb.jdbc.MySQLStatement.executeQuery(MySQLStatement.java:302)
>>>     at
>>> org.mariadb.jdbc.MySQLPreparedStatement.executeQuery(MySQLPreparedStatement.java:112)
>>>     at
>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:880)
>>>     at
>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
>>> Caused by: org.mariadb.jdbc.internal.common.QueryException: You need to
>>> set exactly 3 parameters on the prepared statement
>>>     at
>>> org.mariadb.jdbc.internal.common.query.MySQLParameterizedQuery.validate(MySQLParameterizedQuery.java:117)
>>>     at
>>> org.mariadb.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:976)
>>>     at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:281)
>>>
>>> Regards,
>>> Markus
>>>
>>
>>
>

Mime
View raw message