Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C446BC6AD for ; Thu, 8 Jan 2015 16:37:39 +0000 (UTC) Received: (qmail 7856 invoked by uid 500); 8 Jan 2015 16:37:41 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 7808 invoked by uid 500); 8 Jan 2015 16:37:41 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 7771 invoked by uid 99); 8 Jan 2015 16:37:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2015 16:37:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daddywri@gmail.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Jan 2015 16:37:12 +0000 Received: by mail-yh0-f49.google.com with SMTP id f10so1686421yha.8 for ; Thu, 08 Jan 2015 08:36:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=4o7/JljsYTx2ceZxSW8pdlECvEvWeG5CE3DovewbYY0=; b=bqjVP0xqJXJIxPfK7Ty1aKit+TpO5RSchLBpHiogcx/ZtOPu/pg89RDhA+7gLTpdLY 1P3V4sheTinl2V5d/0f3J1BO2+YyGX9eJVS9ypyIllDs2nRO+yyww0uENlANCogXr7/O Ghekd5YWawGux+4HdQmO4Gyhd/mPQJ7hnasPQb6eFlskLo/1A+ir/mdABnIDXiNiLjE8 T3xVCYJl0yDGzRUdF8ZHtR8h/iml6gQYT1WWrCL2Yb249q+BpjgCc0H3qXGWfDJMR1UD F1fXjvr1LRo896Q5XYYIwHS0Gqx4FTWQGLDGqzFtNys5BauXlQdtum5oTBr86WUbcPl0 4Juw== MIME-Version: 1.0 X-Received: by 10.170.180.5 with SMTP id w5mr8280992ykd.38.1420734985867; Thu, 08 Jan 2015 08:36:25 -0800 (PST) Received: by 10.170.191.21 with HTTP; Thu, 8 Jan 2015 08:36:25 -0800 (PST) In-Reply-To: References: Date: Thu, 8 Jan 2015 11:36:25 -0500 Message-ID: Subject: Re: Metadata fields get lost in 1.7.2 with Sharepoint 2013 repository and Solr output connection From: Karl Wright To: dev Cc: "fatih.cetin" Content-Type: multipart/alternative; boundary=001a113a3e1427194d050c26a478 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a3e1427194d050c26a478 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Actually, I take some of this back. Any SharePoint metadata that is associated with a parent object rather than a child is represented in RepositoryDocument as a Reader[] array. So you should see RepositoryDocumentFactory iterating through all such fields and making a TempFileCharacterInput for each member of each field. If you are seeing only one iteration of the getFields() iterator, it means that the RepositoryDocument object fields member is not properly being managed. But I'm looking that RepositoryDocument code, and addField() looks like it does the right thing for all variations of data types. Karl On Thu, Jan 8, 2015 at 11:24 AM, Karl Wright wrote: > Hi Salih, > > The code you point at is designed to make copies of fields that are > represented by Reader objects. Most SharePoint fields are represented by > String objects, so this code does not apply to them. > > The place you want to look is: > > >>>>>> > // Copy metadata fields (including minting new Readers where needed) > Iterator iter =3D original.getFields(); > if (iter.hasNext()) > { > String fieldName =3D iter.next(); > Object[] objects =3D original.getField(fieldName); > if (objects instanceof Reader[]) > { > CharacterInput[] rts =3D metadataReaders.get(fieldName); > Reader[] newReaders =3D new Reader[rts.length]; > for (int i =3D 0; i < rts.length; i++) > { > rts[i].doneWithStream(); > newReaders[i] =3D rts[i].getStream(); > } > rd.addField(fieldName,newReaders); > } > else if (objects instanceof Date[]) > { > rd.addField(fieldName,(Date[])objects); > } > else if (objects instanceof String[]) > { > rd.addField(fieldName,(String[])objects); > } > else > throw new RuntimeException("Unknown kind of metadata: > "+objects.getClass().getName()); > } > > <<<<<< > > This code should copy all fields to the new RepositoryDocument object > (rd), and do the necessary special manipulation for Reader fields. > > If you'd be willing to send me a screen shot of your job (from your view > job page), I can try to recreate your pipeline here and see what's going = on. > > Thanks, > Karl > > > > On Thu, Jan 8, 2015 at 11:13 AM, Salih Sen wrote: > >> Hi, >> >> We've noticed that metadata of some documents aren't indexed in Solr. >> >> I tried tracking down to issue in source code and noticed that >> RepositoryDocument >> has around 25 fields until it reaches the RepositoryDocumentFactory. >> =E2=80=8B =E2=80=8B >> Document that returned from >> =E2=80=8B =E2=80=8B >> factory.createDocument() >> =E2=80=8B =E2=80=8B >> has only a single field in IncrementalIngester.java line 3089. >> >> >> >> I couldn't get the logic behind if (iter.hasNext()) in the code below >> while >> it has twenty something fields it "iterates" on only the first one. >> Is is the expected behaviour? >> >> A similar code also exist in createDocument() method so I feel I might b= e >> looking at the wrong places but as far as I can see this part creates th= e >> difference between the document comes from Sharepoint repository and the >> one posted to Solr. >> >> Thanks. >> >> >> RepositoryDocumentFactory.java >> ---------------------------------=E2=80=8B------------ >> >> public RepositoryDocumentFactory(RepositoryDocument document) >> throws ManifoldCFException, IOException >> { >> this.original =3D document; >> >> try >> { >> this.binaryTracker =3D new TempFileInput(document.getBinaryStream())= ; >> // Copy all reader streams >> Iterator iter =3D document.getFields(); >> if (iter.hasNext()) >> { >> String fieldName =3D iter.next(); >> Object[] objects =3D document.getField(fieldName); >> if (objects instanceof Reader[]) >> { >> CharacterInput[] newValues =3D new CharacterInput[objects.length= ]; >> metadataReaders.put(fieldName,newValues); >> // Populate newValues >> for (int i =3D 0; i < newValues.length; i++) >> { >> newValues[i] =3D new TempFileCharacterInput((Reader)objects[i]= ); >> } >> } >> } >> } >> catch (Throwable e) >> { >> // Clean up everything we've done so far. >> if (this.binaryTracker !=3D null) >> this.binaryTracker.discard(); >> for (String key : metadataReaders.keySet()) >> { >> CharacterInput[] rt =3D metadataReaders.get(key); >> for (CharacterInput r : rt) >> { >> if (r !=3D null) >> r.discard(); >> } >> } >> if (e instanceof IOException) >> throw (IOException)e; >> else if (e instanceof RuntimeException) >> throw (RuntimeException)e; >> else if (e instanceof Error) >> throw (Error)e; >> else >> throw new RuntimeException("Unknown exception type: >> "+e.getClass().getName()+": "+e.getMessage(),e); >> } >> } >> >> >> >> -- >> >> Salih =C5=9Een >> >> Dili=C5=9Fim Bilgi Bilgisayar ve =C4=B0leti=C5=9Fim Teknolojileri Sanayi= ve Ticaret Ltd. >> Sti. >> >> email: salih@dilisim.com >> >> Tel: 0 222 330 20 21 >> >> GSM: 0 507 296 15 51 >> > > --001a113a3e1427194d050c26a478--