manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: metadata problem for subsite libraries
Date Wed, 12 Mar 2014 15:04:57 GMT
Hi Ahmet,

For sanity, please try printing out metadataDescription right after the
unpack on line 1700:

>>>>>>
              ArrayList metadataDescription = new ArrayList();
              int startPosition =
unpackList(metadataDescription,version,0,'+');
<<<<<<

Thanks,
Karl



On Wed, Mar 12, 2014 at 11:01 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:

> Hi Karl,
>
> metadataValues just before the  fetchAndIndexFile is empty {}
>
> Thanks,
> Ahmet
>
>
>   On Wednesday, March 12, 2014 4:44 PM, Karl Wright <daddywri@gmail.com>
> wrote:
>  Hi Ahmet,
>
> The field names are unpacked at line 1699:
>
> >>>>>>
>               ArrayList metadataDescription = new ArrayList();
>               int startPosition =
> unpackList(metadataDescription,version,0,'+');
> <<<<<<
>
> Starting at 1729, the metadata values are fetched:
>
> >>>>>>
>               Map<String,String> metadataValues = null;
>               if (metadataDescription.size() > 0)
>               {
>                 // Retrieve the library guid from carrydown data
>                 String[] libIDs =
> activities.retrieveParentData(documentIdentifier, "guids");
>
> ...
> <<<<<<
>
> This gets the metadata from SharePoint at line 1750:
>
> >>>>>>
>                 int cutoff = decodedLibPath.lastIndexOf("/");
>                 metadataValues = proxy.getFieldValues(
> metadataDescription, encodePath(site), documentLibID,
> decodedDocumentPath.substring(cutoff+1), dspStsWorks );
> <<<<<<
>
> The metadata values are indexed at line 1764:
>
> >>>>>>
>               if (!fetchAndIndexFile(activities, documentIdentifier,
> version, fileUrl, serverUrl + encodedServerLocation + encodedDocumentPath,
>                 acls, denyAcls, createdDate, modifiedDate, metadataValues,
> guid, sDesc))
> <<<<<<
>
> What I think you want to do is to print out the metadataValues contents
> just before the fetchAndIndexFile method.  If they look good there, then
> we'll take the next step.
>
> Karl
>
>
>
>
> On Wed, Mar 12, 2014 at 10:29 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
> Hi Karl,
>
> sortedMetaDataFields prints all fields that I select from UI. e.g.
> [ArticleByLine, ArticleStartDate, Audience, Author, CampaignType... ]
> What should be next step?
>
> Thanks,
> Ahmet
>
>
>   On Wednesday, March 12, 2014 3:21 PM, Karl Wright <daddywri@gmail.com>
> wrote:
>   Hi Ahmet,
>
> I misspoke; the rules for metadata pay attention only to a path.
>
> The only way we can make progress here is to do some debugging.  In your
> trunk checkout, have a look at SharePointRepository.java starting at line
> 993:
>
> >>>>>>
>             // == Document path ==
>             // Convert the modified document path to an unmodified one,
> plus a library path.
>             String decodedLibPath =
> documentIdentifier.substring(0,dLibSeparatorIndex);
>             String decodedDocumentPath = decodedLibPath +
> documentIdentifier.substring(dLibSeparatorIndex+1);
>             if (checkIncludeFile(decodedDocumentPath,spec))
>             {
>               // This file is included, so calculate a version string.
> This will include metadata info, so get that first.
>               MetadataInformation metadataInfo =
> getMetadataSpecification(decodedDocumentPath,spec);
>
> <<<<<<
>
> The class MetadataInformation describes the metadata that will be included
> given the document path.  Later, at line 1023, specified fields that are
> also part of the library the document is in are found:
>
> >>>>>>
>                 String[] sortedMetadataFields =
> getInterestingFieldSetSorted(metadataInfo,libFields);
> <<<<<<
>
> I suggest modifying the connector to print the contents of
> sortedMetadataFields for each document that comes along.  You will need to
> do whatever necessary to force the recrawl of just those documents whose
> metadata you are not getting.  If sortedMetadataFields does not contain the
> fields you expect, that means that there is something wrong with how the
> rules are being interpreted, or in how the fields for the library are being
> discovered.  If it contains the right fields, then the problem must be in
> how the field names are getting packed and unpacked from the version
> string.  Either way, please let me know.
>
> Karl
>
>
>
> On Wed, Mar 12, 2014 at 9:10 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
> Hi Karl,
>
> I am sorry but I don't follow. I assume, in my config, Paths/PathRule is
> correct since it fetches documents (with no metadata).
>
> In meta data section, there is no place for 'entity type'.
>
> Can you please elaborate?
>
> Thanks,
> Ahmet
>
> On Wednesday, March 12, 2014 2:57 PM, Karl Wright <daddywri@gmail.com>
> wrote:
>
> To clarify: Rules you define must match both the entity type (e.g. site,
> list, lib, or document), as well as the path.  So the example you provided,
> since it does not specify the entity type, is incomplete.
>
> Karl
>
>
>
>
>
> On Wed, Mar 12, 2014 at 8:44 AM, Karl Wright <daddywri@gmail.com> wrote:
>
> Hi Ahmet,
> >
> >All I can remember about this coming up before involved people not having
> appropriate metadata rules.  So if you include a screen shot of your
> metadata rules, that ought to help clarify what is happening.
> >
> >FWIW, metadata for a library will require you to have an explicit
> matching library rule on your metadata tab.  Since this is a subsite, you
> will also need a site rule.
> >
> >Thanks,
> >Karl
> >
> >
> >
> >
> >
> >On Wed, Mar 12, 2014 at 8:35 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
> >
> >Hi,
> >>
> >>I am connection a SharePoint 2010 instance with both trunk and
> ManifoldCF 1.5.1 version.
> >>
> >>When I define a job to crawl a document library by "add site", no
> MetaData is sent to output connector. I can see list of metadata and select
> them. But only GUID (although I don't select GUID nor it is listed in the
> list) is sent. Documents are indexed but no metadata.
> >>
> >>There is no metadata problem with Lists.
> >>
> >>
> >>'Document Library' Example
> >>/site1/site2/Documents/* does not honour selected MetaData.
> >>/Documents/* honurs selected MetaData.
> >>
> >>I think someone has reported similar  problems (for document library
> under {sub}(site) in the past but I couldn't find the e-mail or jira.
> >>
> >>Thanks,
> >>Ahmet
> >>
> >
>
>
>
>
>
>
>
>

Mime
View raw message