manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radek Sklenicka <radek.skleni...@gmail.com>
Subject Re: Documentum - unable to index metadata
Date Thu, 31 Mar 2016 06:44:12 GMT
Hi Karl,

Many thanks for your prompt actions.
Just checking with our Documentum guys. I'll let you know as soon as I have
some updates.

Thanks,
Radek


On 31 March 2016 at 07:44, Karl Wright <daddywri@gmail.com> wrote:

> Hi Radek,
>
> A fix for the UI, at least, can be downloaded from the ticket
> CONNECTORS-1293.  I can find no definitive mechanism for why this would
> lead to no attributes being collected, but it's worth applying the patch,
> updating your jobs, and giving it a try nonetheless.  Please let me know
> what happens.
>
> Thanks,
> Karl
>
>
> On Wed, Mar 30, 2016 at 5:21 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Radek,
>>
>> The code that reads attribute values from Documentum DFC persistent
>> objects does use the attribute name, as follows:
>>
>> >>>>>>
>>   /** Get all the values that an attribute has, including multiple ones
>> if present */
>>   public String[] getAttributeValues(String attribute)
>>     throws DocumentumException, RemoteException
>>   {
>>     try
>>     {
>>       int valueCount = object.getValueCount(attribute);
>>       String[] values = new String[valueCount];
>>       int y = 0;
>>       while (y < valueCount)
>>       {
>>         // Fetch the attribute.
>>         // It's supposed to work for all attribute types...
>>         String value = object.getRepeatingString(attribute,y);
>>         values[y++] = value;
>>       }
>>       return values;
>>     }
>>     catch (DfAuthenticationException ex)
>>     {
>>       throw new DocumentumException("Bad credentials:
>> "+ex.getMessage(),DocumentumException.TYPE_BADCREDENTIALS);
>>     }
>>     catch (DfIdentityException ex)
>>     {
>>       throw new DocumentumException("Bad docbase name:
>> "+ex.getMessage(),DocumentumException.TYPE_BADCONNECTIONPARAMS);
>>     }
>>     catch (DfDocbaseUnreachableException e)
>>     {
>>       throw new DocumentumException("Docbase unreachable:
>> "+e.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
>>     }
>>     catch (DfIOException e)
>>     {
>>       throw new DocumentumException("Docbase io exception:
>> "+e.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
>>     }
>>     catch (DfException e)
>>     {
>>       throw new DocumentumException("Documentum error: "+e.getMessage());
>>     }
>>   }
>> <<<<<<
>>
>> This is how the DFC IDfPersistentObject API is structured.  So it doesn't
>> look like multiple language values are supported in DFC.  So I don't know
>> why you wouldn't get attribute values unless the UI issue is causing there
>> to be no specified attributes for whatever type matches the document.  I'll
>> have to dig into that code next.
>>
>> Karl
>>
>>
>> On Wed, Mar 30, 2016 at 9:58 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Radek,
>>>
>>> I will have to check how the connector uses attribute names and get back
>>> to you.  But I am pretty certain that the connector specifies attributes in
>>> its dql queries by means of the attribute name, not the r_object_id.  If
>>> that's the problem, it also implies that there can be a different attribute
>>> value for each language, which might be why you aren't seeing the
>>> attributes you are expecting.
>>>
>>> This is not an easy problem to address, however.
>>>
>>> Can you confirm whether or not documents can have different attribute
>>> values for each language in Documentum?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>> On Wed, Mar 30, 2016 at 9:41 AM, Radek Sklenicka <
>>> radek.sklenicka@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>>
>>>>
>>>> We discovered that we get metadata names in triplicate because there
>>>> are 3 languages installed in Documentum.
>>>>
>>>> Multiple attribute records have each the same attr_name and type_name
>>>> but unique r_object_id and different nls_key (en, es, pt).
>>>>
>>>>
>>>>
>>>> Could this be the reason why metadata doesn’t make it through the
>>>> pipeline and we can’t get any metadata during crawling?
>>>>
>>>> Are unique attr_names required in Documentum connector?
>>>>
>>>>
>>>>
>>>> Any suggestions would be greatly appreciated.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>> Radek
>>>>
>>>> On 23 March 2016 at 18:28, Radek Sklenicka <radek.sklenicka@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for verification, Karl.
>>>>>
>>>>> -Radek
>>>>>
>>>>> On 23 March 2016 at 14:01, Karl Wright <daddywri@gmail.com> wrote:
>>>>>
>>>>>> Hi Radek,
>>>>>>
>>>>>> This log output comes from RMI, apparently, and is not something
I've
>>>>>> ever seen before.  But it does look like it's a complete list of
what's
>>>>>> being returned for a request for the list of attributes (the first
entry),
>>>>>> and for a specific object (the second entry).
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>> On Wed, Mar 23, 2016 at 8:41 AM, Radek Sklenicka <
>>>>>> radek.sklenicka@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Karl,
>>>>>>>
>>>>>>> "select attr_name FROM dmi_dd_attr_info" really returns duplicates
-
>>>>>>> we're looking into that.
>>>>>>>
>>>>>>> Is there also a DQL query (or function) used by ManifoldCF that
we
>>>>>>> can try to check what/if attributes are being returned for a
particular
>>>>>>> record?
>>>>>>>
>>>>>>> We have trace logs from DFC and it looks like the attributes
are
>>>>>>> being returned from the content server.
>>>>>>> Could you please help us decode the logs - where to look/verify
if
>>>>>>> attributes are handed over to ManifoldCF?
>>>>>>> Can we deduce from the logs attached below that the attributes
are
>>>>>>> transferred from DFC to ManifoldCF?
>>>>>>>
>>>>>>> Many thanks,
>>>>>>> Radek
>>>>>>>
>>>>>>>
>>>>>>> 2016-03-22 13:29:26.008 <USER_DTESTER|s9(21.0)|SM@14660772>
 [RMI
>>>>>>> TCP Connection(1823)-127.0.0.1] [EXIT]
>>>>>>>  .com.documentum.fc.client.DfTypedObject@36b9ba.getLiteType ==>
>>>>>>> AspectedLiteType@110eb5e{name=do_domep_project_hse, typeVersion=0,
>>>>>>> cacheVStamp=178498, attributes={asp_herencia.atr_isnew,
>>>>>>> asp_herencia.atr_niveles, asp_herencia.atr_tipo, asp_herencia.i_partition},
>>>>>>> superType=LiteType@2045f2{name=do_domep_project_hse,
>>>>>>> typeVersion=32, cacheVStamp=178498, attributes={atr_audit_type,
>>>>>>> atr_speciality, atr_emergency_related}, superType=LiteType@a67471{name=do_domep_project,
>>>>>>> typeVersion=32, cacheVStamp=178486, attributes={atr_uwi, atr_well_name,
>>>>>>> atr_usi, atr_survey_name}, superType=LiteType@f74077{name=do_domep_base,
>>>>>>> typeVersion=27, cacheVStamp=178438, attributes={atr_confidential_level,
>>>>>>> atr_owner_area, atr_logical_code, atr_original_reference_id,
atr_revision,
>>>>>>> atr_entity, atr_author, atr_doc_type, atr_category_doc, atr_subcat_doc,
>>>>>>> atr_discipline, atr_subdiscipline, atr_language, atr_physical_document,
>>>>>>> atr_physical_code, atr_warehouse, atr_retention, atr_digital_media,
>>>>>>> atr_internal, atr_country, atr_basin, atr_environment, atr_acreage,
>>>>>>> atr_abstract, atr_doc_creation_date, atr_title, atr_collection,
>>>>>>> atr_is_collection, atr_is_principal, atr_is_anexo, atr_id_collection,
>>>>>>> atr_is_relation, atr_field, atr_original_revision, atr_remarks,
>>>>>>> atr_keywords, atr_principal_folder_id, atr_original_version,
atr_be_name,
>>>>>>> atr_be_ref, atr_be_short_name, atr_issued_for_code,
>>>>>>> atr_issued_for_description, atr_subbasin, atr_be_type_id, atr_comment,
>>>>>>> atr_status, atr_prepared_by, atr_preparation_date, atr_verified_by,
>>>>>>> atr_verification_date, atr_approved_by, atr_approval_date, atr_workflow},
>>>>>>> superType=LiteType@19390bd{name=do_general, typeVersion=6,
>>>>>>> cacheVStamp=167968, attributes={negocio, attr_is_gdcom},
>>>>>>> superType=LiteType@16658e8{name=dm_document, typeVersion=2,
>>>>>>> cacheVStamp=52034, attributes={}, superType=LiteType@6aac49{name=dm_sysobject,
>>>>>>> typeVersion=3, cacheVStamp=0, attributes={object_name, r_object_type,
>>>>>>> title, subject, authors, keywords, a_application_type, a_status,
>>>>>>> r_creation_date, r_modify_date, r_modifier, r_access_date, a_is_hidden,
>>>>>>> i_is_deleted, a_retention_date, a_archive, a_compound_architecture,
>>>>>>> a_link_resolved, i_reference_cnt, i_has_folder, i_folder_id,
>>>>>>> r_composite_id, r_composite_label, r_component_label, r_order_no,
>>>>>>> r_link_cnt, r_link_high_cnt, r_assembled_from_id, r_frzn_assembly_cnt,
>>>>>>> r_has_frzn_assembly, resolution_label, r_is_virtual_doc, i_contents_id,
>>>>>>> a_content_type, r_page_cnt, r_content_size, a_full_text, a_storage_type,
>>>>>>> i_cabinet_id, owner_name, owner_permit, group_name, group_permit,
>>>>>>> world_permit, i_antecedent_id, i_chronicle_id, i_latest_flag,
r_lock_owner,
>>>>>>> r_lock_date, r_lock_machine, log_entry, r_version_label, i_branch_cnt,
>>>>>>> i_direct_dsc, r_immutable_flag, r_frozen_flag, r_has_events,
acl_domain,
>>>>>>> acl_name, a_special_app, i_is_reference, r_creator_name, r_is_public,
>>>>>>> r_policy_id, r_resume_state, r_current_state, r_alias_set_id,
>>>>>>> a_effective_date, a_expiration_date, a_publish_formats, a_effective_label,
>>>>>>> a_effective_flag, a_category, language_code, a_is_template,
>>>>>>> a_controlling_app, r_full_content_size, a_extended_properties,
a_is_signed,
>>>>>>> a_last_review_date, i_retain_until, r_aspect_name, i_retainer_id,
>>>>>>> i_partition, i_is_replica, i_vstamp}}}}}}}}
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2016-03-22 13:29:26.010 <USER_DTESTER|s5(17.0)|SM@16366401>
 [RMI
>>>>>>> TCP Connection(1820)-127.0.0.1] [RPC_EXIT]  ......RPC: applyForObject
==>
>>>>>>> TypedData@144302b[id=098c1b38809921b1,
>>>>>>> type=do_domep_project_well_wd, readOnly=false, autoFill=true,
>>>>>>> fetchTimestamp=0, values=[object_name=DSC00683.JPG,
>>>>>>> r_object_type=do_domep_project_well_wd, title=, subject=, authors=[],
>>>>>>> keywords=[], a_application_type=, a_status=, r_creation_date=2/11/2016
>>>>>>> 8:35:45 AM, r_modify_date=3/7/2016 11:17:37 AM, r_modifier=admdcmt,
>>>>>>> r_access_date=3/16/2016 12:23:59 PM, a_is_hidden=F, i_is_deleted=F,
>>>>>>> a_retention_date=nulldate, a_archive=F, a_compound_architecture=,
>>>>>>> a_link_resolved=F, i_reference_cnt=1, i_has_folder=T,
>>>>>>> i_folder_id=[0b8c1b3880991ad3], r_composite_id=[], r_composite_label=[],
>>>>>>> r_component_label=[], r_order_no=[], r_link_cnt=0, r_link_high_cnt=0,
>>>>>>> r_assembled_from_id=0000000000000000, r_frzn_assembly_cnt=0,
>>>>>>> r_has_frzn_assembly=F, resolution_label=, r_is_virtual_doc=0,
>>>>>>> i_contents_id=068c1b388064051a, a_content_type=jpeg, r_page_cnt=1,
>>>>>>> r_content_size=949228, a_full_text=T, a_storage_type=repo,
>>>>>>> i_cabinet_id=0c8c1b38806aee30, owner_name=Domep controlador 00010,
>>>>>>> owner_permit=7, group_name=, group_permit=1, world_permit=1,
>>>>>>> i_antecedent_id=0000000000000000, i_chronicle_id=098c1b38809921b1,
>>>>>>> i_latest_flag=T, r_lock_owner=, r_lock_date=nulldate, r_lock_machine=,
>>>>>>> log_entry=, r_version_label=[1.0, CURRENT], i_branch_cnt=0, i_direct_dsc=F,
>>>>>>> r_immutable_flag=F, r_frozen_flag=F, r_has_events=F, acl_domain=admdocum,
>>>>>>> acl_name=domep_ac_en_02080, a_special_app=, i_is_reference=F,
>>>>>>> r_creator_name=Domep controlador 00010, r_is_public=F,
>>>>>>> r_policy_id=468c1b38809c6e47, r_resume_state=-1, r_current_state=0,
>>>>>>> r_alias_set_id=0000000000000000, a_effective_date=[], a_expiration_date=[],
>>>>>>> a_publish_formats=[], a_effective_label=[], a_effective_flag=[],
>>>>>>> a_category=, language_code=, a_is_template=F, a_controlling_app=,
>>>>>>> r_full_content_size=949228, a_extended_properties=[], a_is_signed=F,
>>>>>>> a_last_review_date=nulldate, i_retain_until=nulldate,
>>>>>>> r_aspect_name=[asp_herencia], i_retainer_id=[], i_partition=0,
>>>>>>> i_is_replica=F, i_vstamp=4, negocio=E&P, attr_is_gdcom=F,
>>>>>>> atr_confidential_level=Internal Use, atr_owner_area=OFICINA DE
E,
>>>>>>> atr_logical_code=IQEXPEOMKUR000WEL2016000063, atr_original_reference_id=[],
>>>>>>> atr_revision=, atr_entity=[], atr_author=[Domep controlador 00010],
>>>>>>> atr_doc_type=Reporting, atr_category_doc=Geology, atr_subcat_doc=Progress
>>>>>>> Report, atr_discipline=GEOLOGY, atr_subdiscipline=[],
>>>>>>> atr_language=[ENGLISH], atr_physical_document=F, atr_physical_code=[],
>>>>>>> atr_warehouse=, atr_retention=YES, atr_digital_media=F, atr_internal=YES,
>>>>>>> atr_country=[XYZ], atr_basin=[XYZZ], atr_environment=, atr_acreage=[],
>>>>>>> atr_abstract=, atr_doc_creation_date=5/22/2006 8:31:43 AM, atr_title=WELL
>>>>>>> BARAM 1 PHOTOS FIELD 06, atr_collection=[], atr_is_collection=F,
>>>>>>> atr_is_principal=F, atr_is_anexo=F, atr_id_collection=[],
>>>>>>> atr_is_relation=F, atr_field=[], atr_original_revision=, atr_remarks=,
>>>>>>> atr_keywords=[], atr_principal_folder_id=0b8c1b3880991ad3,
>>>>>>> atr_original_version=, atr_be_name=BARAM 1, atr_be_ref=IQWEL000008,
>>>>>>> atr_be_short_name=BA 1, atr_issued_for_code=, atr_issued_for_description=,
>>>>>>> atr_subbasin=[], atr_be_type_id=8, atr_comment=, atr_status=Draft,
>>>>>>> atr_prepared_by=[], atr_preparation_date=nulldate, atr_verified_by=[],
>>>>>>> atr_verification_date=nulldate, atr_approved_by=[],
>>>>>>> atr_approval_date=nulldate, atr_workflow=, atr_well_name=BARAM
1,
>>>>>>> atr_uwi=IQ010004432, atr_borehole_name=[BARAM 1], atr_ubhi=[IQ01000443200],
>>>>>>> atr_borehole_alias=[], atr_borehole_short_name=[], atr_sample_type=[],
>>>>>>> atr_analysis_type=[], asp_herencia.atr_isnew=F, asp_herencia.atr_niveles=0,
>>>>>>> asp_herencia.atr_tipo=[], asp_herencia.i_partition=0,
>>>>>>> r_object_id=098c1b38809921b1, _KEEP_LOCK_=F, _FREEZE_COMPONENTS_=F,
>>>>>>> _THAW_COMPONENTS_=F, _CONTENTS_CHANGED_=F, _DIST_SAVE_AS_NEW_=F]]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 March 2016 at 15:33, Radek Sklenicka <
>>>>>>> radek.sklenicka@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks Karl, we'll verify that.
>>>>>>>>
>>>>>>>> -Radek
>>>>>>>>
>>>>>>>> On 11 March 2016 at 14:21, Karl Wright <daddywri@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>> Hi Radek,
>>>>>>>>>
>>>>>>>>> This is the DQL query that is run:
>>>>>>>>>
>>>>>>>>>       String strDQL = "select attr_name FROM dmi_dd_attr_info
>>>>>>>>> where type_name = '" + docType + "' order by attr_name
asc";
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 11, 2016 at 8:19 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Radek,
>>>>>>>>>>
>>>>>>>>>> The Document Types page runs a DQL query to populate
the document
>>>>>>>>>> types.  The fact that you get duplicates means that
something may be
>>>>>>>>>> corrupt with your Document instance.  It's possible
that for some reason
>>>>>>>>>> the instance is set up with multiple records that
each have the same name
>>>>>>>>>> but different key values.
>>>>>>>>>>
>>>>>>>>>> Documentum used to have a little web app that allowed
you to
>>>>>>>>>> execute DQL queries.  I'd experiment to see what
was leading to the
>>>>>>>>>> duplication.  The fact that you can't get any metadata
during crawling is
>>>>>>>>>> almost certainly related.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 11, 2016 at 8:10 AM, Radek Sklenicka
<
>>>>>>>>>> radek.sklenicka@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> We are not able to pull metadata from one of
our Documentum
>>>>>>>>>>> instances (it is 6.7)
>>>>>>>>>>> Interestingly, on the Job > Document Types
page each metadata
>>>>>>>>>>> field is displayed 3 times in the metadata boxes
- could this be an issue?
>>>>>>>>>>> Screenshots:
>>>>>>>>>>> http://take.ms/mJhPh
>>>>>>>>>>> http://take.ms/AMZF0
>>>>>>>>>>> We have quite a long list of document types and
it takes minutes
>>>>>>>>>>> to load the Document Types page.
>>>>>>>>>>>
>>>>>>>>>>> Also, we can successfully pull metadata from
our testing
>>>>>>>>>>> Documentum (it is 7.1), and I noticed that there
is a difference in
>>>>>>>>>>> connector logs between the two:
>>>>>>>>>>>
>>>>>>>>>>> 1.) here we are able to pull metadata:
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2016-03-10 03:50:08,051 (Worker thread
'3') - DCTM:
>>>>>>>>>>> Document 090007c28000569d has version label:
>>>>>>>>>>> 11+authors+object_name+owner_name+owner_permit+r_creation_date+r_creator_name+r_modifier+r_modify_date+r_object_id+r_object_type+title++0+DEAD_AUTHORITY+1.0_0_
>>>>>>>>>>> http://localhost/webtop/
>>>>>>>>>>> DEBUG 2016-03-10 03:50:08,052 (Worker thread
'3') - DCTM: Inside
>>>>>>>>>>> processDocuments
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2.) NOT able to pull metadata:
>>>>>>>>>>>
>>>>>>>>>>> DEBUG 2016-03-10 14:58:22,908 (Worker thread
'22') - DCTM:
>>>>>>>>>>> Document 098c1b3880991f48 has version label:
0++0+DEAD_AUTHORITY+_4_
>>>>>>>>>>> http://localhost/webtop
>>>>>>>>>>> DEBUG 2016-03-10 14:58:22,908 (Worker thread
'22') - DCTM:
>>>>>>>>>>> Inside processDocuments
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any ideas will be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>>
>>>>>>>>>>> Radek
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message