lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Schema model to store additional field metadata
Date Fri, 07 Sep 2012 18:14:07 GMT
Why would you store the actual images in SOLR? There is no way to
really search the bytes of image, is there? What you probably want to
do is extract all searchable metadata out of that image, name, alt,
EXIF, etc.

And you are most likely looking at dynamic fields as the solution

1) Define *_Path, *_Size, *_Alt as a dynamic field with appropriate types
2) During indexing, write those properties as Image_1_Path,
Image_1_Size, Image_1_Alt or some such
3) Make sure that whatever search algorithm you have looks at those or
do a copyField to aggregate them into AllImage_Alt, etc.

I do something similar by extracting metadata from .DOC files with
Tika and indexing it all regardless of the actual names.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Sep 7, 2012 at 1:31 PM,  <sysrq@web.de> wrote:
> Hi,
>
> I want to create a Solr index of articles. Each article should have a title, content,
published date and an arbitrary number of images attached to. An article could look like this:
>
> title:     An article about Foo and Bar
> content:   This is some text about Foo and Bar.
> published: 2012.09.07T19:23
> image:     2012/09/foo.png
> image:     2012/04/images/bar.png
> image:     2012/02/abc.png
>
> I want to display the images with html <img>-tags. But despite src I also want
to include an alt attribute to describe each image with additional metadata. For example I
want to display the article like this:
>
> <h3>An article about Foo and Bar</h3>
> <p>This is some text about Foo and Bar.</p>
> <img src="2012/09/foo.png"        alt="Foo. Waiting for the bus." />
> <img src="2012/04/images/bar.png" alt="Bar again" />
> <img src="2012/02/abc.png"        alt="Foo and Bar at the beach" />
>
> I know that I can use a multiValued field to store the images. But how should or I can
store the additional src information? I have a problem finding the right schema for my index.

Mime
View raw message