manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Job with Generic Connector stop to work
Date Fri, 06 May 2016 13:25:07 GMT
Ok, it's now clear what you are looking for, but it is still not clear how
we'd integrate that in the JDBC connector.  How did you do this when you
modified the connector for 1.8?

Karl


On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <alicataluca@gmail.com> wrote:

> Hi Karl,
> sorry for my english :).
> I mean the fact that i've to extract value from query with a join between
> two table with a relationship of one-to-many, the dataset returned from
> Connector is only one pair from the two table.
>
> For example:
> Table A with persons
> Table B with eyes
>
> As result of join, i aspect have two row like:
> person 1, eye left
> person 1, eye right
>
> but the connector returns only one row:
> person 1, eye left
>
> I hope now it's more clear.
>
> Ps. i report the phrase on Manifold documentation that explain that (
> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
> ):
> ------
> There is currently no support in the JDBC connection type for natively
> handling multi-valued metadata.
> ------
>
> Thanks,
> L. Alicata
>
>
> 2016-05-06 15:10 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>
>> Hi Luca,
>>
>> It is not clear what you mean by "multi value extraction" using the JDBC
>> connector.  The JDBC connector allows collection of primary binary content
>> as well as metadata from a database row.  So maybe if you can explain what
>> you need beyond that it would help.
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <alicataluca@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>> thanks for information, fortunately in other jboss instance i have a old
>>> Manifold configuration with single process, that i've dismissed. But in
>>> this moment, i start to test this jobs with that and if it work fine, i can
>>> use it only for this job and use it also in production. Maybe after, if i
>>> can, i try to check the possible problem that stop the agent.
>>>
>>> I Take advantage of this discussion to ask you, if multi-value
>>> extraction from db is consider as possible future work or no. Because i've
>>> used this generi connector to resolve this lack of JDBC Connector. In fact
>>> with Manifold 1.8 i've modified the connector to support this behavior (in
>>> addiction to parse blob file), but upgrade Manifold Version, to not rewrite
>>> the new connector i decide to use Generic Connector with application that
>>> do the work of extraction data from DB.
>>>
>>> Thanks,
>>> L. Alicata
>>>
>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>
>>>> Hi Luca,
>>>>
>>>> If you do a lock clean and the process still stops, then the locks are
>>>> not the problem.
>>>>
>>>> One way we can drill down into the problem is to get a thread dump of
>>>> the agents process after it stops.  The thread dump must be of the agents
>>>> process, not any of the others.
>>>>
>>>> FWIW, the generic connector is not well supported; the person who wrote
>>>> it is still a committer but is not actively involved in MCF development at
>>>> this time.  I suspect that the problem may have to do with how that
>>>> connector deals with exceptions or errors, but I am not sure.
>>>>
>>>> Thanks,
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <alicataluca@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Karl,
>>>>> I've just tried with lock-clean after agents stop to work, obviously
>>>>> after stopping process. After this, job start correctly, but just second
>>>>> time that i start a job with a lot of data (or sometimes the third time),
>>>>> agent stop again.
>>>>>
>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>> Zookeeper in this environment, but this can resolve the fact that during
>>>>> working agents stop to work? or help only for cleaning lock agent when
i
>>>>> restart the process?
>>>>>
>>>>> Thanks,
>>>>> L. Alicata
>>>>>
>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>>
>>>>>> Hi Luca,
>>>>>>
>>>>>> With file-based synchronization, if you kill any of the processes
>>>>>> involved, you will need to execute the lock-clean procedure to make
sure
>>>>>> you have no dangling locks in the file system.
>>>>>>
>>>>>> - shut down all MCF processes (except the database)
>>>>>> - run the lock-clean script
>>>>>> - start your MCF processes back up
>>>>>>
>>>>>> I suspect what you are seeing is related to this.
>>>>>>
>>>>>> Also, please consider using Zookeeper instead, since it is more
>>>>>> robust about cleaning out dangling locks.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <alicataluca@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>> thanks for help.
>>>>>>> In my case i've only one instance of MCF running, with both type
of
>>>>>>> job (SP and Generic), and so i have only one properties files
(that i have
>>>>>>> attached).
>>>>>>> For information i used (multiprocess-file configuration) with
>>>>>>> postgres.
>>>>>>>
>>>>>>> Do you have other suggestions? do you need more information,
that i
>>>>>>> can give you?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> L.Alicata
>>>>>>>
>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>>>>
>>>>>>>> Hi Luca,
>>>>>>>>
>>>>>>>> Do you have multiple independent MCF clusters running at
the same
>>>>>>>> time?  It sounds like you do: you have SP on one, and Generic
on another.
>>>>>>>> If so, you will need to be sure that the synchronization
you are using
>>>>>>>> (either zookeeper or file-based) does not overlap.  Each
cluster needs its
>>>>>>>> own synchronization.  If there is overlap, then doing things
with one
>>>>>>>> cluster may cause the other cluster to hang.  This also means
you have to
>>>>>>>> have different properties files for the two clusters, of
course.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata <alicataluca@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> i'm using Manifold 2.2 with multi-process configuration
in Jboss
>>>>>>>>> instance inside a Windows Server 2012 and i've a set
of job that work with
>>>>>>>>> Sharepoint (SP) or Generic Connector (GC), that get file
from a db.
>>>>>>>>> With SP i've no problem, while with GC with a lot of
document (one
>>>>>>>>> with 47k and another with 60k), the Seed taking process,
sometimes, not
>>>>>>>>> finish, because the agents seem to stop (although java
process is still
>>>>>>>>> alive).
>>>>>>>>> After this, if i try to start any other job, that not
start, like
>>>>>>>>> the agents are stopped.
>>>>>>>>>
>>>>>>>>> Other times, this jobs work correctly and one time together
work
>>>>>>>>> correctly, running in the same moment.
>>>>>>>>>
>>>>>>>>> For information:
>>>>>>>>>
>>>>>>>>>    - On Jboss there are only Manifold and Generic Repository
>>>>>>>>>    application.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - On the same Virtual Server, there is another Jboss
istance,
>>>>>>>>>    with solr istance and a web application.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - I've check if it was a type of memory problem, but
it's not
>>>>>>>>>    the case.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - GC with almost 23k seed work always, at least in
test that
>>>>>>>>>    i've done.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - In local instance of Jboss with Manifold and Generic
>>>>>>>>>    Rpository Application, i've not keep this problem.
>>>>>>>>>
>>>>>>>>> This is the only recurrent information that i've seen
on
>>>>>>>>> manifold.log:
>>>>>>>>> ---------------
>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port>
shut down
>>>>>>>>> Releasing connection
>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>
>>>>>>>>> ---------------
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> L. Alicata
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message