lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: document storage
Date Sun, 15 May 2011 15:48:46 GMT
Where are the documents coming from? Because storing them ONLY in
Solr risks losing them if your index is somehow hosed.

Storing them externally only has the advantage that your index will be
much smaller, which helps when replicating as you scale. The downside
here is that highlighting will be more resource-intensive since you're
re-analyzing text in order to highlight.

So, as usual, "it depends" (tm). What is the scale you need? What
is the QPS you're thinking of supporting?


On Fri, May 13, 2011 at 3:10 PM, Mike Sokolov <> wrote:
> Would anyone care to comment on the merits of storing indexed full-text
> documents in Solr versus storing them externally?
> It seems there are three options for us:
> 1) store documents both in Solr and externally - this is what we are doing
> now, and gives us all sorts of flexibility, but doesn't seem like the most
> scalable option, at least in terms of storage space and I/O required when
> updating/inserting documents.
> 2) store documents externally: For the moment, the only thing that requires
> us to store documents in Solr is the need to highlight them, both in search
> result snippets and in full document views. We are considering hunting for
> or writing a Highlighter extension that could pull in the document text from
> an external source (eg filesystem).
> 3) store documents only in Solr.  We'd just retrieve document text as a Solr
> field value rather than reading from the filesystem.  Somehow this strikes
> me as the wrong thing to do, but it could work:  I'm not sure why.  A lot of
> unnecessary merging I/O activity perhaps.  Makes it hard to grep the
> documents or use other filesystem tools, I suppose.
> Which one of these sounds best to you?  Under which circumstances? Are there
> other possibilities?
> Thanks!
> --
> Michael Sokolov
> Engineering Director

View raw message