lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: DIH Blob data
Date Fri, 14 Nov 2014 18:43:19 GMT
Just skimming, so maybe I misinterpreted.

ExternalFileField and ExternalFileFieldReloader
refer to storing values for each doc in an external file, they have
nothing to do with storing _files_.

The usual pattern is to have Solr store just enough data to have the
system-of-record return the actual file rather than have Solr
actually store the file. Solr isn't really built for this and while some
people do this it usually is a poor design if for no other reason than
as segments merge, the data gets copied again and again and again
to no good purpose.

Best,
Erick

On Fri, Nov 14, 2014 at 12:21 PM, Anurag Sharma <anurag6v@gmail.com> wrote:
> bq: We routinely store images and pdfs in Solr. There *is* a benefit, since
> you don't need to manage another storage system, you don't have to worry
> about Solr getting out of sync with the other system, you can use Solr
> replication for all your assets, etc.
>
> Do the same holds good for large Blobs like image, audio, video as well?
> Tika supports multiple file formats (http://tika.apache.org/1.5/formats.html)
> but not sure how good is the Solr/Tika combination. Storing pdf and other
> docs could be useful in Solr, tika can extract metadata from the docs and
> make them discoverable.
>
> Considering all the above cases there should also be a support for File
> field type in Solr like other types Date, Float, Int, Long, String etc. but
> looks like there are only two file types (
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/)
> and both re external file storage.
>
>    - ExternalFileField.java
>    <http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/ExternalFileField.java>
>    - ExternalFileFieldReloader.java
>    <http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/schema/ExternalFileFieldReloader.java>
>
> What type can be used in schema when storing the files internally?
>
>
> On Thu, Nov 13, 2014 at 3:48 AM, Jeon Woosung <jeonwoosung@gmail.com> wrote:
>
>> How about this?
>>
>> First, define a field for filter query. It should be multivalued.
>>
>> Second, implements transformer to extract json dynamic fields, and put the
>> dynamic fields into the solr field.
>>
>> For example,
>>
>> <fieldType name="terms" class="string" multivalued="true"/>
>>
>> Data : {a:1,b:2,c:3}
>>
>> You can split the data to "a:1", "b:2", "c:3", and put them into terms.
>>
>> And then you can use filter query like "fq=terms:a:1"
>> 2014. 11. 13. 오전 3:59에 "Michael Sokolov" <msokolov@safaribooksonline.com
>> >님이
>> 작성:
>>
>> > We routinely store images and pdfs in Solr. There *is* a benefit, since
>> > you don't need to manage another storage system, you don't have to worry
>> > about Solr getting out of sync with the other system, you can use Solr
>> > replication for all your assets, etc.
>> >
>> > I don't use DIH, so personally I don't care whether it handles blobs, but
>> > it does seem like a natural extension for a system that indexes data from
>> > SQL in Solr.
>> >
>> > -Mike
>> >
>> >
>> > On 11/12/2014 01:31 PM, Anurag Sharma wrote:
>> >
>> >> BLOB is non-searchable field so there is no benefit of storing it into
>> >> Solr. Any external key-value store can be used to store the blob and
>> >> reference of this blob can be stored as a string field in Solr.
>> >>
>> >> On Wed, Nov 12, 2014 at 5:56 PM, stockii <stock.jonas@googlemail.com>
>> >> wrote:
>> >>
>> >>  I had a similar problem and didnt find any solution to use the fields
>> in
>> >>> JSON
>> >>> Blob for a filter ... Not with DIH.
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://lucene.472066.n3.nabble.com/DIH-Blob-data-tp4168896p4168925.html
>> >>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >
>>

Mime
View raw message