manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Alicata <alicatal...@gmail.com>
Subject Re: Job with Generic Connector stop to work
Date Fri, 06 May 2016 14:15:20 GMT
Hi Karl,
I can confirm that it is a little expensive, but at that time, i haven't
much time, and i stop to work after found the solution.
Thanks for the creation of the ticket, for the moment, i try to use generic
connector.

An other question, there is another connector that can use an application
to receive data? Like GenericConnector?

Thanks,
L. Alicata

2016-05-06 16:02 GMT+02:00 Karl Wright <daddywri@gmail.com>:

> Hi Luca,
>
> This approach causes each document's binary data to be read more than
> once.  I think that is expensive, especially if there are a lot of values.
> for a row.
>
> Instead I think something more like ACLs will be needed -- that is, a
> separate query for each multi-valued field.  This is more work but it would
> work much better.
>
> I will create a ticket to add this to the JDBC connector, but it won't
> happen for a while.
>
> Karl
>
>
> On Fri, May 6, 2016 at 9:40 AM, Luca Alicata <alicataluca@gmail.com>
> wrote:
>
>> I've decompile java connector and modified the code in this way:
>>
>> in process document, i see that just currently arrive all row of query
>> result (also multi values row), but in the cycle that parse document, after
>> first document with an ID, all the other with the same are skipped.
>> So i removed the control that not permits to check other document with
>> the same ID and i modified the method that store metadata, to permit to
>> store multi value data as array in metadata mapping.
>>
>> I attached the code in this e-mail. You can find a comment that start
>> with "---", that i insert know for you.
>>
>> Thanks,
>> L. Alicata
>>
>> 2016-05-06 15:25 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>
>>> Ok, it's now clear what you are looking for, but it is still not clear
>>> how we'd integrate that in the JDBC connector.  How did you do this when
>>> you modified the connector for 1.8?
>>>
>>> Karl
>>>
>>>
>>> On Fri, May 6, 2016 at 9:21 AM, Luca Alicata <alicataluca@gmail.com>
>>> wrote:
>>>
>>>> Hi Karl,
>>>> sorry for my english :).
>>>> I mean the fact that i've to extract value from query with a join
>>>> between two table with a relationship of one-to-many, the dataset returned
>>>> from Connector is only one pair from the two table.
>>>>
>>>> For example:
>>>> Table A with persons
>>>> Table B with eyes
>>>>
>>>> As result of join, i aspect have two row like:
>>>> person 1, eye left
>>>> person 1, eye right
>>>>
>>>> but the connector returns only one row:
>>>> person 1, eye left
>>>>
>>>> I hope now it's more clear.
>>>>
>>>> Ps. i report the phrase on Manifold documentation that explain that (
>>>> https://manifoldcf.apache.org/release/release-2.3/en_US/end-user-documentation.html#jdbcrepository
>>>> ):
>>>> ------
>>>> There is currently no support in the JDBC connection type for natively
>>>> handling multi-valued metadata.
>>>> ------
>>>>
>>>> Thanks,
>>>> L. Alicata
>>>>
>>>>
>>>> 2016-05-06 15:10 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>
>>>>> Hi Luca,
>>>>>
>>>>> It is not clear what you mean by "multi value extraction" using the
>>>>> JDBC connector.  The JDBC connector allows collection of primary binary
>>>>> content as well as metadata from a database row.  So maybe if you can
>>>>> explain what you need beyond that it would help.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, May 6, 2016 at 9:04 AM, Luca Alicata <alicataluca@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>> thanks for information, fortunately in other jboss instance i have
a
>>>>>> old Manifold configuration with single process, that i've dismissed.
But in
>>>>>> this moment, i start to test this jobs with that and if it work fine,
i can
>>>>>> use it only for this job and use it also in production. Maybe after,
if i
>>>>>> can, i try to check the possible problem that stop the agent.
>>>>>>
>>>>>> I Take advantage of this discussion to ask you, if multi-value
>>>>>> extraction from db is consider as possible future work or no. Because
i've
>>>>>> used this generi connector to resolve this lack of JDBC Connector.
In fact
>>>>>> with Manifold 1.8 i've modified the connector to support this behavior
(in
>>>>>> addiction to parse blob file), but upgrade Manifold Version, to not
rewrite
>>>>>> the new connector i decide to use Generic Connector with application
that
>>>>>> do the work of extraction data from DB.
>>>>>>
>>>>>> Thanks,
>>>>>> L. Alicata
>>>>>>
>>>>>> 2016-05-06 14:42 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>>>
>>>>>>> Hi Luca,
>>>>>>>
>>>>>>> If you do a lock clean and the process still stops, then the
locks
>>>>>>> are not the problem.
>>>>>>>
>>>>>>> One way we can drill down into the problem is to get a thread
dump
>>>>>>> of the agents process after it stops.  The thread dump must be
of the
>>>>>>> agents process, not any of the others.
>>>>>>>
>>>>>>> FWIW, the generic connector is not well supported; the person
who
>>>>>>> wrote it is still a committer but is not actively involved in
MCF
>>>>>>> development at this time.  I suspect that the problem may have
to do with
>>>>>>> how that connector deals with exceptions or errors, but I am
not sure.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 8:38 AM, Luca Alicata <alicataluca@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>> I've just tried with lock-clean after agents stop to work,
>>>>>>>> obviously after stopping process. After this, job start correctly,
but just
>>>>>>>> second time that i start a job with a lot of data (or sometimes
the third
>>>>>>>> time), agent stop again.
>>>>>>>>
>>>>>>>> Unfortunately, it's difficult start, for the moment, to using
>>>>>>>> Zookeeper in this environment, but this can resolve the fact
that during
>>>>>>>> working agents stop to work? or help only for cleaning lock
agent when i
>>>>>>>> restart the process?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> L. Alicata
>>>>>>>>
>>>>>>>> 2016-05-06 14:15 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi Luca,
>>>>>>>>>
>>>>>>>>> With file-based synchronization, if you kill any of the
processes
>>>>>>>>> involved, you will need to execute the lock-clean procedure
to make sure
>>>>>>>>> you have no dangling locks in the file system.
>>>>>>>>>
>>>>>>>>> - shut down all MCF processes (except the database)
>>>>>>>>> - run the lock-clean script
>>>>>>>>> - start your MCF processes back up
>>>>>>>>>
>>>>>>>>> I suspect what you are seeing is related to this.
>>>>>>>>>
>>>>>>>>> Also, please consider using Zookeeper instead, since
it is more
>>>>>>>>> robust about cleaning out dangling locks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 8:06 AM, Luca Alicata <
>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Karl,
>>>>>>>>>> thanks for help.
>>>>>>>>>> In my case i've only one instance of MCF running,
with both type
>>>>>>>>>> of job (SP and Generic), and so i have only one properties
files (that i
>>>>>>>>>> have attached).
>>>>>>>>>> For information i used (multiprocess-file configuration)
with
>>>>>>>>>> postgres.
>>>>>>>>>>
>>>>>>>>>> Do you have other suggestions? do you need more information,
that
>>>>>>>>>> i can give you?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> L.Alicata
>>>>>>>>>>
>>>>>>>>>> 2016-05-06 12:55 GMT+02:00 Karl Wright <daddywri@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> Hi Luca,
>>>>>>>>>>>
>>>>>>>>>>> Do you have multiple independent MCF clusters
running at the
>>>>>>>>>>> same time?  It sounds like you do: you have SP
on one, and Generic on
>>>>>>>>>>> another.  If so, you will need to be sure that
the synchronization you are
>>>>>>>>>>> using (either zookeeper or file-based) does not
overlap.  Each cluster
>>>>>>>>>>> needs its own synchronization.  If there is overlap,
then doing things with
>>>>>>>>>>> one cluster may cause the other cluster to hang.
 This also means you have
>>>>>>>>>>> to have different properties files for the two
clusters, of course.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 4:32 AM, Luca Alicata
<
>>>>>>>>>>> alicataluca@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> i'm using Manifold 2.2 with multi-process
configuration in
>>>>>>>>>>>> Jboss instance inside a Windows Server 2012
and i've a set of job that work
>>>>>>>>>>>> with Sharepoint (SP) or Generic Connector
(GC), that get file from a db.
>>>>>>>>>>>> With SP i've no problem, while with GC with
a lot of document
>>>>>>>>>>>> (one with 47k and another with 60k), the
Seed taking process, sometimes,
>>>>>>>>>>>> not finish, because the agents seem to stop
(although java process is still
>>>>>>>>>>>> alive).
>>>>>>>>>>>> After this, if i try to start any other job,
that not start,
>>>>>>>>>>>> like the agents are stopped.
>>>>>>>>>>>>
>>>>>>>>>>>> Other times, this jobs work correctly and
one time together
>>>>>>>>>>>> work correctly, running in the same moment.
>>>>>>>>>>>>
>>>>>>>>>>>> For information:
>>>>>>>>>>>>
>>>>>>>>>>>>    - On Jboss there are only Manifold and
Generic Repository
>>>>>>>>>>>>    application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - On the same Virtual Server, there is
another Jboss
>>>>>>>>>>>>    istance, with solr istance and a web application.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - I've check if it was a type of memory
problem, but it's
>>>>>>>>>>>>    not the case.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - GC with almost 23k seed work always,
at least in test
>>>>>>>>>>>>    that i've done.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    - In local instance of Jboss with Manifold
and Generic
>>>>>>>>>>>>    Rpository Application, i've not keep this
problem.
>>>>>>>>>>>>
>>>>>>>>>>>> This is the only recurrent information that
i've seen on
>>>>>>>>>>>> manifold.log:
>>>>>>>>>>>> ---------------
>>>>>>>>>>>> Connection 0.0.0.0:62755<-><ip-address>:<port>
shut down
>>>>>>>>>>>> Releasing connection
>>>>>>>>>>>> org.apache.http.impl.conn.ManagedClientConnectionImpl@6c98c1bd
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> L. Alicata
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message