manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Windows-Share to Solr is not working properly
Date Fri, 28 Mar 2014 14:29:11 GMT
Hi Alexander,

It's hard to figure out exactly what you have configured from your email,
but here are a couple of points:

(1) ManifoldCF does not extract dates from binary files; it will only
supply dates from file metadata.  So MCF is supplying the date from the
modification date of the Windows file.
(2) The JCIFS connector provides the same metadata date value in two ways:

    rd.addField("lastModified", lastModifiedDate.toString());
    rd.setModifiedDate(lastModifiedDate);

This was done for backwards compatibility reasons.  You can control which
metadata value name is used for the ModifiedDate field on the Solr
connection's Schema tab.

As for the "lastModified" data, you can either map that to a field you
don't have in your solr schema, or you can suppress it entirely by creating
an entry for Field Mapping that has "lastModified" on the left and a blank
field on the right, and then clicking the "Add" button.  Bear in mind that
1.5 had a bug in this functionality which was fixed in 1.5.1.

Karl




On Fri, Mar 28, 2014 at 10:13 AM, Alexander Stoffers <
stoffers@modell-aachen.de> wrote:

> Hi Karl,
>
> we have a problem with crawling documents out of a windows share to Solr.
>
> Our Solr schema has a date field that is not multivalued, but the output
> of the crawled (e.g. pdf) document has a date array instead of a single
> date.
>
> I tried to remove the the whole field with the tab "Solr Field Mapping",
> using date=>'' but is not working at all. Can´t i remove the date metadata
> at all?
>
> We figured out, that the crawler get´s the date metadata field out of the
> binaries where we found a field, called ModDate. If we remove the ModDate
> field out of the binaries the date metadata field disapears.
>
> Can you explain, why the crawler puts the ModDate twice in the date field
> array?
>
>
> Thank you in Advance
> Alex
>
>
>
> --
> --
>
> Dipl.-Wirt.-Ing. Alexander Stoffers
> Leiter IT & Produktentwicklung
> Modell Aachen GmbH - Interaktive Managementsysteme
> Dennewartstr. 25-27, 52068 Aachen
> fon ++49 176 1011 9752, fax ++49 241 9148 8653
> http://www.modell-aachen.de
>
> Geschäftsführung: Dr.-Ing. Carsten Behrens
> Amtsgericht Aachen, HRB 15622
>
> --
>
> Unseren IT-Support erreichen Sie unter
> support@modell-aachen.de
> +49 (0)241 53808720
>

Mime
View raw message