manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Documentum - unable to index metadata
Date Wed, 30 Mar 2016 21:21:43 GMT
Hi Radek,

The code that reads attribute values from Documentum DFC persistent objects
does use the attribute name, as follows:

>>>>>>
  /** Get all the values that an attribute has, including multiple ones if
present */
  public String[] getAttributeValues(String attribute)
    throws DocumentumException, RemoteException
  {
    try
    {
      int valueCount = object.getValueCount(attribute);
      String[] values = new String[valueCount];
      int y = 0;
      while (y < valueCount)
      {
        // Fetch the attribute.
        // It's supposed to work for all attribute types...
        String value = object.getRepeatingString(attribute,y);
        values[y++] = value;
      }
      return values;
    }
    catch (DfAuthenticationException ex)
    {
      throw new DocumentumException("Bad credentials:
"+ex.getMessage(),DocumentumException.TYPE_BADCREDENTIALS);
    }
    catch (DfIdentityException ex)
    {
      throw new DocumentumException("Bad docbase name:
"+ex.getMessage(),DocumentumException.TYPE_BADCONNECTIONPARAMS);
    }
    catch (DfDocbaseUnreachableException e)
    {
      throw new DocumentumException("Docbase unreachable:
"+e.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
    }
    catch (DfIOException e)
    {
      throw new DocumentumException("Docbase io exception:
"+e.getMessage(),DocumentumException.TYPE_SERVICEINTERRUPTION);
    }
    catch (DfException e)
    {
      throw new DocumentumException("Documentum error: "+e.getMessage());
    }
  }
<<<<<<

This is how the DFC IDfPersistentObject API is structured.  So it doesn't
look like multiple language values are supported in DFC.  So I don't know
why you wouldn't get attribute values unless the UI issue is causing there
to be no specified attributes for whatever type matches the document.  I'll
have to dig into that code next.

Karl


On Wed, Mar 30, 2016 at 9:58 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Radek,
>
> I will have to check how the connector uses attribute names and get back
> to you.  But I am pretty certain that the connector specifies attributes in
> its dql queries by means of the attribute name, not the r_object_id.  If
> that's the problem, it also implies that there can be a different attribute
> value for each language, which might be why you aren't seeing the
> attributes you are expecting.
>
> This is not an easy problem to address, however.
>
> Can you confirm whether or not documents can have different attribute
> values for each language in Documentum?
>
> Thanks,
> Karl
>
>
>
>
> On Wed, Mar 30, 2016 at 9:41 AM, Radek Sklenicka <
> radek.sklenicka@gmail.com> wrote:
>
>> Hi Karl,
>>
>>
>>
>> We discovered that we get metadata names in triplicate because there are
>> 3 languages installed in Documentum.
>>
>> Multiple attribute records have each the same attr_name and type_name
>> but unique r_object_id and different nls_key (en, es, pt).
>>
>>
>>
>> Could this be the reason why metadata doesn’t make it through the
>> pipeline and we can’t get any metadata during crawling?
>>
>> Are unique attr_names required in Documentum connector?
>>
>>
>>
>> Any suggestions would be greatly appreciated.
>>
>>
>>
>> Thank you,
>>
>>
>> Radek
>>
>> On 23 March 2016 at 18:28, Radek Sklenicka <radek.sklenicka@gmail.com>
>> wrote:
>>
>>> Thanks for verification, Karl.
>>>
>>> -Radek
>>>
>>> On 23 March 2016 at 14:01, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Radek,
>>>>
>>>> This log output comes from RMI, apparently, and is not something I've
>>>> ever seen before.  But it does look like it's a complete list of what's
>>>> being returned for a request for the list of attributes (the first entry),
>>>> and for a specific object (the second entry).
>>>>
>>>> Karl
>>>>
>>>> On Wed, Mar 23, 2016 at 8:41 AM, Radek Sklenicka <
>>>> radek.sklenicka@gmail.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>
>>>>> "select attr_name FROM dmi_dd_attr_info" really returns duplicates -
>>>>> we're looking into that.
>>>>>
>>>>> Is there also a DQL query (or function) used by ManifoldCF that we can
>>>>> try to check what/if attributes are being returned for a particular record?
>>>>>
>>>>> We have trace logs from DFC and it looks like the attributes are being
>>>>> returned from the content server.
>>>>> Could you please help us decode the logs - where to look/verify if
>>>>> attributes are handed over to ManifoldCF?
>>>>> Can we deduce from the logs attached below that the attributes are
>>>>> transferred from DFC to ManifoldCF?
>>>>>
>>>>> Many thanks,
>>>>> Radek
>>>>>
>>>>>
>>>>> 2016-03-22 13:29:26.008 <USER_DTESTER|s9(21.0)|SM@14660772>  [RMI
TCP
>>>>> Connection(1823)-127.0.0.1] [EXIT]
>>>>>  .com.documentum.fc.client.DfTypedObject@36b9ba.getLiteType ==>
>>>>> AspectedLiteType@110eb5e{name=do_domep_project_hse, typeVersion=0,
>>>>> cacheVStamp=178498, attributes={asp_herencia.atr_isnew,
>>>>> asp_herencia.atr_niveles, asp_herencia.atr_tipo, asp_herencia.i_partition},
>>>>> superType=LiteType@2045f2{name=do_domep_project_hse, typeVersion=32,
>>>>> cacheVStamp=178498, attributes={atr_audit_type, atr_speciality,
>>>>> atr_emergency_related}, superType=LiteType@a67471{name=do_domep_project,
>>>>> typeVersion=32, cacheVStamp=178486, attributes={atr_uwi, atr_well_name,
>>>>> atr_usi, atr_survey_name}, superType=LiteType@f74077{name=do_domep_base,
>>>>> typeVersion=27, cacheVStamp=178438, attributes={atr_confidential_level,
>>>>> atr_owner_area, atr_logical_code, atr_original_reference_id, atr_revision,
>>>>> atr_entity, atr_author, atr_doc_type, atr_category_doc, atr_subcat_doc,
>>>>> atr_discipline, atr_subdiscipline, atr_language, atr_physical_document,
>>>>> atr_physical_code, atr_warehouse, atr_retention, atr_digital_media,
>>>>> atr_internal, atr_country, atr_basin, atr_environment, atr_acreage,
>>>>> atr_abstract, atr_doc_creation_date, atr_title, atr_collection,
>>>>> atr_is_collection, atr_is_principal, atr_is_anexo, atr_id_collection,
>>>>> atr_is_relation, atr_field, atr_original_revision, atr_remarks,
>>>>> atr_keywords, atr_principal_folder_id, atr_original_version, atr_be_name,
>>>>> atr_be_ref, atr_be_short_name, atr_issued_for_code,
>>>>> atr_issued_for_description, atr_subbasin, atr_be_type_id, atr_comment,
>>>>> atr_status, atr_prepared_by, atr_preparation_date, atr_verified_by,
>>>>> atr_verification_date, atr_approved_by, atr_approval_date, atr_workflow},
>>>>> superType=LiteType@19390bd{name=do_general, typeVersion=6,
>>>>> cacheVStamp=167968, attributes={negocio, attr_is_gdcom},
>>>>> superType=LiteType@16658e8{name=dm_document, typeVersion=2,
>>>>> cacheVStamp=52034, attributes={}, superType=LiteType@6aac49{name=dm_sysobject,
>>>>> typeVersion=3, cacheVStamp=0, attributes={object_name, r_object_type,
>>>>> title, subject, authors, keywords, a_application_type, a_status,
>>>>> r_creation_date, r_modify_date, r_modifier, r_access_date, a_is_hidden,
>>>>> i_is_deleted, a_retention_date, a_archive, a_compound_architecture,
>>>>> a_link_resolved, i_reference_cnt, i_has_folder, i_folder_id,
>>>>> r_composite_id, r_composite_label, r_component_label, r_order_no,
>>>>> r_link_cnt, r_link_high_cnt, r_assembled_from_id, r_frzn_assembly_cnt,
>>>>> r_has_frzn_assembly, resolution_label, r_is_virtual_doc, i_contents_id,
>>>>> a_content_type, r_page_cnt, r_content_size, a_full_text, a_storage_type,
>>>>> i_cabinet_id, owner_name, owner_permit, group_name, group_permit,
>>>>> world_permit, i_antecedent_id, i_chronicle_id, i_latest_flag, r_lock_owner,
>>>>> r_lock_date, r_lock_machine, log_entry, r_version_label, i_branch_cnt,
>>>>> i_direct_dsc, r_immutable_flag, r_frozen_flag, r_has_events, acl_domain,
>>>>> acl_name, a_special_app, i_is_reference, r_creator_name, r_is_public,
>>>>> r_policy_id, r_resume_state, r_current_state, r_alias_set_id,
>>>>> a_effective_date, a_expiration_date, a_publish_formats, a_effective_label,
>>>>> a_effective_flag, a_category, language_code, a_is_template,
>>>>> a_controlling_app, r_full_content_size, a_extended_properties, a_is_signed,
>>>>> a_last_review_date, i_retain_until, r_aspect_name, i_retainer_id,
>>>>> i_partition, i_is_replica, i_vstamp}}}}}}}}
>>>>>
>>>>>
>>>>>
>>>>> 2016-03-22 13:29:26.010 <USER_DTESTER|s5(17.0)|SM@16366401>  [RMI
TCP
>>>>> Connection(1820)-127.0.0.1] [RPC_EXIT]  ......RPC: applyForObject ==>
>>>>> TypedData@144302b[id=098c1b38809921b1, type=do_domep_project_well_wd,
>>>>> readOnly=false, autoFill=true, fetchTimestamp=0,
>>>>> values=[object_name=DSC00683.JPG, r_object_type=do_domep_project_well_wd,
>>>>> title=, subject=, authors=[], keywords=[], a_application_type=, a_status=,
>>>>> r_creation_date=2/11/2016 8:35:45 AM, r_modify_date=3/7/2016 11:17:37
AM,
>>>>> r_modifier=admdcmt, r_access_date=3/16/2016 12:23:59 PM, a_is_hidden=F,
>>>>> i_is_deleted=F, a_retention_date=nulldate, a_archive=F,
>>>>> a_compound_architecture=, a_link_resolved=F, i_reference_cnt=1,
>>>>> i_has_folder=T, i_folder_id=[0b8c1b3880991ad3], r_composite_id=[],
>>>>> r_composite_label=[], r_component_label=[], r_order_no=[], r_link_cnt=0,
>>>>> r_link_high_cnt=0, r_assembled_from_id=0000000000000000,
>>>>> r_frzn_assembly_cnt=0, r_has_frzn_assembly=F, resolution_label=,
>>>>> r_is_virtual_doc=0, i_contents_id=068c1b388064051a, a_content_type=jpeg,
>>>>> r_page_cnt=1, r_content_size=949228, a_full_text=T, a_storage_type=repo,
>>>>> i_cabinet_id=0c8c1b38806aee30, owner_name=Domep controlador 00010,
>>>>> owner_permit=7, group_name=, group_permit=1, world_permit=1,
>>>>> i_antecedent_id=0000000000000000, i_chronicle_id=098c1b38809921b1,
>>>>> i_latest_flag=T, r_lock_owner=, r_lock_date=nulldate, r_lock_machine=,
>>>>> log_entry=, r_version_label=[1.0, CURRENT], i_branch_cnt=0, i_direct_dsc=F,
>>>>> r_immutable_flag=F, r_frozen_flag=F, r_has_events=F, acl_domain=admdocum,
>>>>> acl_name=domep_ac_en_02080, a_special_app=, i_is_reference=F,
>>>>> r_creator_name=Domep controlador 00010, r_is_public=F,
>>>>> r_policy_id=468c1b38809c6e47, r_resume_state=-1, r_current_state=0,
>>>>> r_alias_set_id=0000000000000000, a_effective_date=[], a_expiration_date=[],
>>>>> a_publish_formats=[], a_effective_label=[], a_effective_flag=[],
>>>>> a_category=, language_code=, a_is_template=F, a_controlling_app=,
>>>>> r_full_content_size=949228, a_extended_properties=[], a_is_signed=F,
>>>>> a_last_review_date=nulldate, i_retain_until=nulldate,
>>>>> r_aspect_name=[asp_herencia], i_retainer_id=[], i_partition=0,
>>>>> i_is_replica=F, i_vstamp=4, negocio=E&P, attr_is_gdcom=F,
>>>>> atr_confidential_level=Internal Use, atr_owner_area=OFICINA DE E,
>>>>> atr_logical_code=IQEXPEOMKUR000WEL2016000063, atr_original_reference_id=[],
>>>>> atr_revision=, atr_entity=[], atr_author=[Domep controlador 00010],
>>>>> atr_doc_type=Reporting, atr_category_doc=Geology, atr_subcat_doc=Progress
>>>>> Report, atr_discipline=GEOLOGY, atr_subdiscipline=[],
>>>>> atr_language=[ENGLISH], atr_physical_document=F, atr_physical_code=[],
>>>>> atr_warehouse=, atr_retention=YES, atr_digital_media=F, atr_internal=YES,
>>>>> atr_country=[XYZ], atr_basin=[XYZZ], atr_environment=, atr_acreage=[],
>>>>> atr_abstract=, atr_doc_creation_date=5/22/2006 8:31:43 AM, atr_title=WELL
>>>>> BARAM 1 PHOTOS FIELD 06, atr_collection=[], atr_is_collection=F,
>>>>> atr_is_principal=F, atr_is_anexo=F, atr_id_collection=[],
>>>>> atr_is_relation=F, atr_field=[], atr_original_revision=, atr_remarks=,
>>>>> atr_keywords=[], atr_principal_folder_id=0b8c1b3880991ad3,
>>>>> atr_original_version=, atr_be_name=BARAM 1, atr_be_ref=IQWEL000008,
>>>>> atr_be_short_name=BA 1, atr_issued_for_code=, atr_issued_for_description=,
>>>>> atr_subbasin=[], atr_be_type_id=8, atr_comment=, atr_status=Draft,
>>>>> atr_prepared_by=[], atr_preparation_date=nulldate, atr_verified_by=[],
>>>>> atr_verification_date=nulldate, atr_approved_by=[],
>>>>> atr_approval_date=nulldate, atr_workflow=, atr_well_name=BARAM 1,
>>>>> atr_uwi=IQ010004432, atr_borehole_name=[BARAM 1], atr_ubhi=[IQ01000443200],
>>>>> atr_borehole_alias=[], atr_borehole_short_name=[], atr_sample_type=[],
>>>>> atr_analysis_type=[], asp_herencia.atr_isnew=F, asp_herencia.atr_niveles=0,
>>>>> asp_herencia.atr_tipo=[], asp_herencia.i_partition=0,
>>>>> r_object_id=098c1b38809921b1, _KEEP_LOCK_=F, _FREEZE_COMPONENTS_=F,
>>>>> _THAW_COMPONENTS_=F, _CONTENTS_CHANGED_=F, _DIST_SAVE_AS_NEW_=F]]
>>>>>
>>>>>
>>>>>
>>>>> On 11 March 2016 at 15:33, Radek Sklenicka <radek.sklenicka@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Karl, we'll verify that.
>>>>>>
>>>>>> -Radek
>>>>>>
>>>>>> On 11 March 2016 at 14:21, Karl Wright <daddywri@gmail.com>
wrote:
>>>>>>
>>>>>>> Hi Radek,
>>>>>>>
>>>>>>> This is the DQL query that is run:
>>>>>>>
>>>>>>>       String strDQL = "select attr_name FROM dmi_dd_attr_info
where
>>>>>>> type_name = '" + docType + "' order by attr_name asc";
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 11, 2016 at 8:19 AM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Radek,
>>>>>>>>
>>>>>>>> The Document Types page runs a DQL query to populate the
document
>>>>>>>> types.  The fact that you get duplicates means that something
may be
>>>>>>>> corrupt with your Document instance.  It's possible that
for some reason
>>>>>>>> the instance is set up with multiple records that each have
the same name
>>>>>>>> but different key values.
>>>>>>>>
>>>>>>>> Documentum used to have a little web app that allowed you
to
>>>>>>>> execute DQL queries.  I'd experiment to see what was leading
to the
>>>>>>>> duplication.  The fact that you can't get any metadata during
crawling is
>>>>>>>> almost certainly related.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 11, 2016 at 8:10 AM, Radek Sklenicka <
>>>>>>>> radek.sklenicka@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> We are not able to pull metadata from one of our Documentum
>>>>>>>>> instances (it is 6.7)
>>>>>>>>> Interestingly, on the Job > Document Types page each
metadata
>>>>>>>>> field is displayed 3 times in the metadata boxes - could
this be an issue?
>>>>>>>>> Screenshots:
>>>>>>>>> http://take.ms/mJhPh
>>>>>>>>> http://take.ms/AMZF0
>>>>>>>>> We have quite a long list of document types and it takes
minutes
>>>>>>>>> to load the Document Types page.
>>>>>>>>>
>>>>>>>>> Also, we can successfully pull metadata from our testing
>>>>>>>>> Documentum (it is 7.1), and I noticed that there is a
difference in
>>>>>>>>> connector logs between the two:
>>>>>>>>>
>>>>>>>>> 1.) here we are able to pull metadata:
>>>>>>>>>
>>>>>>>>> DEBUG 2016-03-10 03:50:08,051 (Worker thread '3') - DCTM:
Document
>>>>>>>>> 090007c28000569d has version label:
>>>>>>>>> 11+authors+object_name+owner_name+owner_permit+r_creation_date+r_creator_name+r_modifier+r_modify_date+r_object_id+r_object_type+title++0+DEAD_AUTHORITY+1.0_0_
>>>>>>>>> http://localhost/webtop/
>>>>>>>>> DEBUG 2016-03-10 03:50:08,052 (Worker thread '3') - DCTM:
Inside
>>>>>>>>> processDocuments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2.) NOT able to pull metadata:
>>>>>>>>>
>>>>>>>>> DEBUG 2016-03-10 14:58:22,908 (Worker thread '22') -
DCTM:
>>>>>>>>> Document 098c1b3880991f48 has version label: 0++0+DEAD_AUTHORITY+_4_
>>>>>>>>> http://localhost/webtop
>>>>>>>>> DEBUG 2016-03-10 14:58:22,908 (Worker thread '22') -
DCTM: Inside
>>>>>>>>> processDocuments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any ideas will be appreciated.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>>
>>>>>>>>> Radek
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message