manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Metadata fields get lost in 1.7.2 with Sharepoint 2013 repository and Solr output connection
Date Thu, 08 Jan 2015 16:24:36 GMT
Hi Salih,

The code you point at is designed to make copies of fields that are
represented by Reader objects.  Most SharePoint fields are represented by
String objects, so this code does not apply to them.

The place you want to look is:

>>>>>>
    // Copy metadata fields (including minting new Readers where needed)
    Iterator<String> iter = original.getFields();
    if (iter.hasNext())
    {
      String fieldName = iter.next();
      Object[] objects = original.getField(fieldName);
      if (objects instanceof Reader[])
      {
        CharacterInput[] rts = metadataReaders.get(fieldName);
        Reader[] newReaders = new Reader[rts.length];
        for (int i = 0; i < rts.length; i++)
        {
          rts[i].doneWithStream();
          newReaders[i] = rts[i].getStream();
        }
        rd.addField(fieldName,newReaders);
      }
      else if (objects instanceof Date[])
      {
        rd.addField(fieldName,(Date[])objects);
      }
      else if (objects instanceof String[])
      {
        rd.addField(fieldName,(String[])objects);
      }
      else
        throw new RuntimeException("Unknown kind of metadata:
"+objects.getClass().getName());
    }

<<<<<<

This code should copy all fields to the new RepositoryDocument object (rd),
and do the necessary special manipulation for Reader fields.

If you'd be willing to send me a screen shot of your job (from your view
job page), I can try to recreate your pipeline here and see what's going on.

Thanks,
Karl



On Thu, Jan 8, 2015 at 11:13 AM, Salih Sen <salih@dilisim.com> wrote:

> Hi,
>
> We've noticed that metadata of some documents aren't indexed in Solr.
>
> I tried tracking down to issue in source code and noticed that
> RepositoryDocument
> has around 25 fields until it reaches the RepositoryDocumentFactory.
> ​ ​
> Document that returned from
> ​ ​
> factory.createDocument()
> ​ ​
> has only a single field in IncrementalIngester.java line 3089.
>
>
>
> I couldn't get the logic behind if (iter.hasNext()) in the code below while
> it has twenty something fields it "iterates" on only the first one.
> Is is the expected behaviour?
>
> A similar code also exist in createDocument() method so I feel I might be
> looking at the wrong places but as far as I can see this part creates the
> difference between the document comes from Sharepoint repository and the
> one posted to Solr.
>
> Thanks.
>
>
> RepositoryDocumentFactory.java
> ---------------------------------​------------
>
> public RepositoryDocumentFactory(RepositoryDocument document)
>   throws ManifoldCFException, IOException
> {
>   this.original = document;
>
>   try
>   {
>     this.binaryTracker = new TempFileInput(document.getBinaryStream());
>     // Copy all reader streams
>     Iterator<String> iter = document.getFields();
>     if (iter.hasNext())
>     {
>       String fieldName = iter.next();
>       Object[] objects = document.getField(fieldName);
>       if (objects instanceof Reader[])
>       {
>         CharacterInput[] newValues = new CharacterInput[objects.length];
>         metadataReaders.put(fieldName,newValues);
>         // Populate newValues
>         for (int i = 0; i < newValues.length; i++)
>         {
>           newValues[i] = new TempFileCharacterInput((Reader)objects[i]);
>         }
>       }
>     }
>   }
>   catch (Throwable e)
>   {
>     // Clean up everything we've done so far.
>     if (this.binaryTracker != null)
>       this.binaryTracker.discard();
>     for (String key : metadataReaders.keySet())
>     {
>       CharacterInput[] rt = metadataReaders.get(key);
>       for (CharacterInput r : rt)
>       {
>         if (r != null)
>           r.discard();
>       }
>     }
>     if (e instanceof IOException)
>       throw (IOException)e;
>     else if (e instanceof RuntimeException)
>       throw (RuntimeException)e;
>     else if (e instanceof Error)
>       throw (Error)e;
>     else
>       throw new RuntimeException("Unknown exception type:
> "+e.getClass().getName()+": "+e.getMessage(),e);
>   }
> }
>
>
>
> --
>
> Salih Şen
>
> Dilişim Bilgi Bilgisayar ve İletişim Teknolojileri Sanayi ve Ticaret Ltd.
> Sti.
>
> email: salih@dilisim.com
>
> Tel: 0 222 330 20 21
>
> GSM: 0 507 296 15 51
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message