lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr: separating index and storage
Date Thu, 06 Jun 2013 11:33:10 GMT
By and large, stored fields are pretty irrelevant for resource
consumption _except_ for
disk space consumed. Sharded systems work fine, the
stored data is stored in the index files (*.fdt and *.fdx) files in
each segment on each shard.

But you haven't told us anything about your data. How much are
you talking about here? 100s of G? Terabytes? Other than disk
space, You may well be anticipating problems that don't exist...

Now, when _returning_ documents the fields must be read, so
there is some resource consumption there which you can
mitigate with lazy field loading. But this is usually just a few docs
so often isn't a problem.

Best
Erick

On Thu, Jun 6, 2013 at 3:34 AM, Sourajit Basak <sourajit.basac@gmail.com> wrote:
> Absolutely. Solr will return the reference along the docs/results; those
> references may be used to look-up the actual stuff. Such use cases aren't
> hard to solve.
>
> If the use case demands returning the actual stuff alongside the results,
> it becomes non-trivial, especially during high loads.
>
> To avoid this and do a quick implementation I can judiciously create stored
> fields and see how it performs. I will need to figure out what happens if
> the volume growth of stored fields is high, how much is the disk I/O and
> what happens if we shard the index, like, what happens to the stored fields
> then.
>
> Best,
> Sourajit
>
>
>
>
> On Tue, Jun 4, 2013 at 5:31 PM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> You have to index something with your Solr documents that
>> has meaning in _your_ system so you can find the
>> original record. You don't search this field, you just
>> return it with the search results and then use it to get
>> the original document.
>>
>> If you're storing the original in a DB, this can be the PK.
>> If on a file system the path. etc.
>>
>> Essentially, since the association is specific to your environment
>> you need to handle it explicitly...
>>
>> Best
>> Erick
>>
>> On Mon, Jun 3, 2013 at 11:56 AM, Sourajit Basak
>> <sourajit.basac@gmail.com> wrote:
>> > Consider the following use case.
>> >
>> > Certain words are extracted from a document and indexed. The exact
>> sentence
>> > containing the word cannot be stored alongside the extracted word because
>> > of the volume at which the documents grow; How can the index and, lets
>> call
>> > it doc servers be separated ?
>> >
>> > An option is to store the sentences in MongoDB or a RDBMS. But there
>> seems
>> > to be a schema level design issue. Assuming 'word' to be a multivalued
>> > field, how do we associate to it a reference to the corresponding entry
>> in
>> > the doc server.
>> >
>> > May create (word_1, ref_1) tuples. Is there any other in-built feature ?
>> >
>> > Any related project which separates index & doc servers ?
>> >
>> > Thanks,
>> > Sourajit
>>

Mime
View raw message