lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Parsing and indexing parts of the input file paths
Date Wed, 22 Jul 2015 16:42:48 GMT
Don't understand your question. If you're talking two different
fields, use copyField.

On Wed, Jul 22, 2015 at 8:55 AM, Andrew Musselman
<andrew.musselman@gmail.com> wrote:
> Fwding to user..
>
> ---------- Forwarded message ----------
> From: Andrew Musselman <andrew.musselman@gmail.com>
> Date: Wed, Jul 22, 2015 at 8:54 AM
> Subject: Re: Parsing and indexing parts of the input file paths
> To: dev@lucene.apache.org
>
>
> Thanks, and tell it to index the "id" field, which eventually contains the
> file path?
>
> On Wed, Jul 22, 2015 at 8:48 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> PatternReplacecFilterFactory would be just a configuration solution,
>> construct a fieldType in schema.xml and you're done. It would require
>> re-indexing of course.
>>
>> Best,
>> Erick
>>
>> On Tue, Jul 21, 2015 at 5:59 PM, Andrew Musselman
>> <andrew.musselman@gmail.com> wrote:
>> > Erik, thanks; the prefix starting with "/user/andrew/" will be known, and
>> > can be put into config, let's assume.  Would this be config-only or
>> would it
>> > require some code, and could you point to some classes I can start with
>> if I
>> > need to write code, and some up-to-date docs?
>> >
>> > Same for the update processor, is there an example I could read?
>> >
>> > On Tue, Jul 21, 2015 at 11:19 AM, Erik Hatcher <erik.hatcher@gmail.com>
>> > wrote:
>> >>
>> >> If this is only for search, then an analysis chain could be crafted,
>> >> likely with the pattern regex filter in the mix, to pull out pieces of
>> the
>> >> path.  How will you know the prefix of the file though?
>> >>
>> >> There’s also the ability to do this sort of thing in an update
>> processor,
>> >> most easily using the script update processor, using a bit of
>> JavaScript to
>> >> pull out the piece(s) you want to index (and even store at this point).
>> >>
>> >> —
>> >> Erik Hatcher, Senior Solutions Architect
>> >> http://www.lucidworks.com
>> >>
>> >>
>> >>
>> >>
>> >> On Jul 21, 2015, at 1:31 PM, Andrew Musselman <
>> andrew.musselman@gmail.com>
>> >> wrote:
>> >>
>> >> Dear user and dev lists,
>> >>
>> >> We are loading files from a directory and would like to index a portion
>> of
>> >> each file path as a field as well as the text inside the file.
>> >>
>> >> E.g., on HDFS we have this file path:
>> >>
>> >> /user/andrew/1234/1234/file.pdf
>> >>
>> >> And we would like the "1234" token parsed from the file path and indexed
>> >> as an additional field that can be searched on.
>> >>
>> >> From my initial searches I can't see how to do this easily, so would I
>> >> need to write some custom code, or a plugin?
>> >>
>> >> Thanks!
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>

Mime
View raw message