Return-Path: X-Original-To: apmail-manifoldcf-user-archive@www.apache.org Delivered-To: apmail-manifoldcf-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 206F911C68 for ; Fri, 28 Mar 2014 22:45:51 +0000 (UTC) Received: (qmail 62884 invoked by uid 500); 28 Mar 2014 22:45:22 -0000 Delivered-To: apmail-manifoldcf-user-archive@manifoldcf.apache.org Received: (qmail 62777 invoked by uid 500); 28 Mar 2014 22:45:21 -0000 Mailing-List: contact user-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@manifoldcf.apache.org Delivered-To: mailing list user@manifoldcf.apache.org Received: (qmail 62397 invoked by uid 99); 28 Mar 2014 22:45:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 22:45:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.138.90.69] (HELO nm6.bullet.mail.ne1.yahoo.com) (98.138.90.69) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Mar 2014 22:45:01 +0000 Received: from [98.138.100.102] by nm6.bullet.mail.ne1.yahoo.com with NNFMP; 28 Mar 2014 22:44:41 -0000 Received: from [98.138.89.232] by tm101.bullet.mail.ne1.yahoo.com with NNFMP; 28 Mar 2014 22:44:41 -0000 Received: from [127.0.0.1] by omp1047.mail.ne1.yahoo.com with NNFMP; 28 Mar 2014 22:44:40 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 995192.13823.bm@omp1047.mail.ne1.yahoo.com Received: (qmail 17427 invoked by uid 60001); 28 Mar 2014 22:44:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1396046680; bh=OxB33BCX9SwM3ZkOwccxfvA9lo891kySSpX0nTWd/SY=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=3iKigJwtUZo7e2CC0CoYlNBGpue9FtVwEKlpkKNSjuIZpdqqTJDlIVYBKZGI/f8NVgqcAeD0YbQ7+/100ekRk6j2kK0+OnelI7llX9QtFwJyAZ50VsK9cjJsAjJ1jPAFMx/qHb0swgJACSWkCMSpn0f2NykqeyqJ8wcBrkKLewI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=iDyGzSWw5fZdt1X44ExDW4fph0zWxvJq8SUHZLKsjn2aQBI40Z3ceGThPYAyYEK9FG7O7KWR98nQYxcOVtAWvsIULFwwOV3AHJmQvxsjf5utjUZIHaSGzivGf1rd8zu0OZkogBXmGajQ2Qrh3UVgMPNYvli3uFORp/KbPfD4oSE=; X-YMail-OSG: kvUjfugVM1mIQPeq4k917dYpOweMYKI58ZEBsGepF2VXUPB DYX0cKUIifdxqF418pzw5t1f1R_.oDbfYeYp55Mh2STOm85qsVkBeNWaYoE5 GE5Y.2bBSn26D1ABPgfDu7094BjYO4nshoZvn894aaBuTBF5zL9gENRSzIVM Bou9M0TzhyKtSeJCb4l0MCK6FxSrLUwPpcLG5S0fqqJz83wk5v6DV8MOesmt ZF05rem6L0vqGyVLaeYezSPk2zet4ZAEj4jH5DopNTJobqdRuWPg7Oo6rGIi uA6YJqodY9NfPM9t5onhqFi54f.GRzv1Qarb7xJx._wwSwv78uOi5rT6Ajab II2FinIAWNybA9ArsGZ3vUKmZU9XOOmxHtdSwBesomdEprjxgZrU2wP6RDiY cEsgVJ._EvsNS64SzLh1NLh4iqaQeiJ3158R.O0S92CDcQWw14IYk8JUIPxD opSgdYxSIgTeQnGb9Ch.FaCCtjNK7z74rNZ7eMQ_Y3D22Gr1Cq.hdn6UpvM5 dT.5Tauij8y5CkyQ.RjbYcjsa5YKRUfPvi4L0tEGslXXBMYAND_IUv9_.hZz y0_dS.VB4V_njesdVatYDceamtbAcGMabBjX237JAZEuoZd7xyTG.37ECLFg - Received: from [78.167.59.24] by web124705.mail.ne1.yahoo.com via HTTP; Fri, 28 Mar 2014 15:44:40 PDT X-Rocket-MIMEInfo: 002.001,SGnCoEFsZXhhbmRlciwKCldoaWNoIHZlcnNpb24gb2Ygc29sciBhcmUgeW91IHVzaW5nP8KgCgpQbGVhc2UgdHJ5IHRoZXNlIHN0ZXBzOgoKMSkgU2V0IGxpdGVyYWxzT3ZlcnJpZGU9dHJ1ZSBpbiBzb2xyY29uZmlnLnhtbCAoZGVmYXVsdCBzZWN0aW9uIG9mIGV4dHJhY3Rpb24gcmVxdWVzdCBoYW5kbGVyKQoKMikgU2V0wqBmbWFwLmRhdGU9aWdub3JlZF9kYXRlwqBpbiBzb2xyY29uZmlnLnhtbCAoZGVmYXVsdCBzZWN0aW9uIG9mIGV4dHJhY3Rpb24gcmVxdWVzdCBoYW5kbGVyKQoKSWYgbm9uZSBvZiBhYm8BMAEBAQE- X-Mailer: YahooMailWebService/0.8.181.645 References: <2026569255.111196.1396011773544.JavaMail.zimbra@modell-aachen.de> <2030634150.111485.1396016002387.JavaMail.zimbra@modell-aachen.de> <29166091.111841.1396021638503.JavaMail.zimbra@modell-aachen.de> Message-ID: <1396046680.11522.YahooMailNeo@web124705.mail.ne1.yahoo.com> Date: Fri, 28 Mar 2014 15:44:40 -0700 (PDT) From: Ahmet Arslan Reply-To: Ahmet Arslan Subject: Re: Windows-Share to Solr is not working properly To: "user@manifoldcf.apache.org" , Alexander Stoffers In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi=A0Alexander,=0A=0AWhich version of solr are you using?=A0=0A=0APlease tr= y these steps:=0A=0A1) Set literalsOverride=3Dtrue in solrconfig.xml (defau= lt section of extraction request handler)=0A=0A2) Set=A0fmap.date=3Dignored= _date=A0in solrconfig.xml (default section of extraction request handler)= =0A=0AIf none of above works, don't worry, this will work for sure. FirstFi= eldValueUpdateProcessorFactory will convert multi valued field into single = valued one.=0A=0A=A0=0A=0A=A0 = =A0 =0A=A0 = =A0 =A0 =A0 date=0A=A0 =A0 =0A=A0= =A0 =0A=A0 =0A=A0=A0=0A=A0 = =0A=A0 =A0 =A0 =A0=A0=0A=A0 =A0 =A0 remove=0A=A0 =A0 =A0=0A=A0 =0A=0AAh= met=0A=0AOn Friday, March 28, 2014 6:53 PM, Karl Wright wrote:=0A=0AHi Alexander,=0A=0AI do understand your problem.=A0 But I ass= ure you that ManifoldCF does not (and never did) extract metadata fields fr= om binary documents.=A0 Are you sure this is happening in ManifoldCF?=A0 Pe= rhaps you have a Tika pipeline configured in Solr?=0A=0AKarl=0A=0A=0A=0A=0A= =0AOn Fri, Mar 28, 2014 at 11:47 AM, Alexander Stoffers wrote:=0A=0AHi Karl,=0A>=0A>thank you for you quick response!=0A>= =0A>I=B4m sorry for my bad English skills, but i try to get it more clear:= =0A>=0A>I actually don=B4t understand where ManifoldCF processes/maps a met= adata field "date", after crawling a pdf document. We tried to explore the = issue and we figured out that somewhere in the process the metadata field "= ModDate" of the document itself is mapped to the metadata field "date". Fur= thermore the magic "date" field get=B4s an array.=0A>=0A>If we delete the m= etadata field "ModDate" of the document, the metadata field "date" used in = the ManifoldCF process disapears.=0A>=0A>If we don=B4t delete the field "Mo= dDate" of the document, and try to map the field "date" to something else o= r blank, the date field is processed to the Solr output connector, so that = Solr will fail, because the date field is an array and the Solr schema expa= cts an single value for it=B4s date field.=0A>=0A>I hope that i could expla= in our problem a little bit better :-)=0A>=0A>Best Regards=0A>Alex=0A>=0A>-= ---- Urspr=FCngliche Mail -----=0A>Von: "Karl Wright" = =0A>An: user@manifoldcf.apache.org=0A>Gesendet: Freitag, 28. M=E4rz 2014 15= :29:11=0A>Betreff: Re: Windows-Share to Solr is not working properly=0A>=0A= >=0A>Hi Alexander,=0A>=0A>It's hard to figure out exactly what you have con= figured from your email,=0A>but here are a couple of points:=0A>=0A>(1) Man= ifoldCF does not extract dates from binary files; it will only=0A>supply da= tes from file metadata. =A0So MCF is supplying the date from the=0A>modific= ation date of the Windows file.=0A>(2) The JCIFS connector provides the sam= e metadata date value in two ways:=0A>=0A>=A0 =A0 rd.addField("lastModified= ", lastModifiedDate.toString());=0A>=A0 =A0 rd.setModifiedDate(lastModified= Date);=0A>=0A>This was done for backwards compatibility reasons. =A0You can= control which=0A>metadata value name is used for the ModifiedDate field on= the Solr=0A>connection's Schema tab.=0A>=0A>As for the "lastModified" data= , you can either map that to a field you=0A>don't have in your solr schema,= or you can suppress it entirely by creating=0A>an entry for Field Mapping = that has "lastModified" on the left and a blank=0A>field on the right, and = then clicking the "Add" button. =A0Bear in mind that=0A>1.5 had a bug in th= is functionality which was fixed in 1.5.1.=0A>=0A>Karl=0A>=0A>=0A>=0A>=0A>O= n Fri, Mar 28, 2014 at 10:13 AM, Alexander Stoffers <=0A>stoffers@modell-aa= chen.de> wrote:=0A>=0A>> Hi Karl,=0A>>=0A>> we have a problem with crawling= documents out of a windows share to Solr.=0A>>=0A>> Our Solr schema has a = date field that is not multivalued, but the output=0A>> of the crawled (e.g= . pdf) document has a date array instead of a single=0A>> date.=0A>>=0A>> I= tried to remove the the whole field with the tab "Solr Field Mapping",=0A>= > using date=3D>'' but is not working at all. Can=B4t i remove the date met= adata=0A>> at all?=0A>>=0A>> We figured out, that the crawler get=B4s the d= ate metadata field out of the=0A>> binaries where we found a field, called = ModDate. If we remove the ModDate=0A>> field out of the binaries the date m= etadata field disapears.=0A>>=0A>> Can you explain, why the crawler puts th= e ModDate twice in the date field=0A>> array?=0A>>=0A>>=0A>> Thank you in A= dvance=0A>> Alex=0A>>=0A>>=0A>>=0A>> --=0A>> --=0A>>=0A>> Dipl.-Wirt.-Ing. = Alexander Stoffers=0A>> Leiter IT & Produktentwicklung=0A>> Modell Aachen G= mbH - Interaktive Managementsysteme=0A>> Dennewartstr. 25-27, 52068 Aachen= =0A>> fon ++49 176 1011 9752, fax ++49 241 9148 8653=0A>> http://www.modell= -aachen.de=0A>>=0A>> Gesch=E4ftsf=FChrung: Dr.-Ing. Carsten Behrens=0A>> Am= tsgericht Aachen, HRB 15622=0A>>=0A>> --=0A>>=0A>> Unseren IT-Support errei= chen Sie unter=0A>> support@modell-aachen.de=0A>> +49 (0)241 53808720=0A>>= =0A>