lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: how to get modified field data if it doesn't exist in meta
Date Sun, 12 Feb 2017 18:45:09 GMT
It would have to be a custom one. One you write. But I believe Tika
would pass a file name as one of the parameters, so you just need to
use standard Java API to look up the system date. That - of course -
assumes that the files you index are on the same filesystem as Solr
itself, so it could look it up.

You can find more about the UPRs at:
https://cwiki.apache.org/confluence/display/solr/Update+Request+Processors
You can find the full list of the URPs at:
http://www.solr-start.com/info/update-request-processors/
If you are on the latest Solr 6.4, you would probably want to subclass
SimpleUpdateProcessorFactory and follow the implementation example of
TemplateUpdateProcessorFactory
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.0/solr/core/src/java/org/apache/solr/update/processor/TemplateUpdateProcessorFactory.java

Alternatively, you could implement your URP in Javascript, but I am
not sure that has an API to check file dates.

Regards,
   Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 12 February 2017 at 13:28, Gytis Mikuciunas <gytmkc@gmail.com> wrote:
> Alexandre, could you provide some link or give more info about this
> processor?
> I'm novice in the solr world;)
>
>
> Regards,
> Gytis
>
> On Feb 10, 2017 14:59, "Alexandre Rafalovitch" <arafalov@gmail.com> wrote:
>
> Custom update request processor that looks up a file from the name and gets
> the date should work.
>
> Regards,
>     Alex
>
> On 10 Feb 2017 2:39 AM, "Gytis Mikuciunas" <gytmkc@gmail.com> wrote:
>
> Hi,
>
> We have started to use solr for our documents indexing (vsd, vsdx,
> xls,xlsx, doc, docx, pdf, txt).
>
> Modified date values is needed for each file. MS Office's files, pdfs have
> this value.
> Problem is with txt files as they don't have this value in their meta.
>
> Is there any possibility to get it somehow from os level and force adding
> it to solr when we do indexing.
>
> p.s.
>
> Windows 2012 server, single instance
>
> typical command we use: java -Dauto -Dc=index_sandbox -Dport=80
> -Dfiletypes=vsd,vsdx,xls,xlsx,doc,docx,pdf,txt -Dbasicauth=admin:xxxx -jar
> example/exampledocs/post.jar "M:\DNS_dump"
>
>
> Regards,
>
> Gytis

Mime
View raw message