lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: index and data directories
Date Mon, 14 Nov 2016 21:14:01 GMT
Theoretically, perhaps. And it's quite true that stored data for
fields marked stored=true are just passed through verbatim and
compressed on disk while the data associated with indexed=true fields
go through an analysis chain and are stored in a much different
format. However these different data are simply stored in files with
different suffixes in a segment. So you might have _0.fdx, _0.fdt,
_0.tim, _0.tvx etc. that together form a single segment.

This is done on a per-segment basis. So certain segment files, namely
the *.fdt and *.fdx file will contain the stored data while other
extensions have the indexed data, see: "File naming" here for a
somewhat out of date format, but close enough for this discussion:
And there's no option to store the *.fdt and *.fdx files independently
from the rest of the segment files.

This statement: "I mean documents which are to be indexed" really
doesn't make sense. You send these things called Solr documents to be
indexed, but they are just a set of fields with values handled as
their definitions indicate (i.e. respecting stored=true|false,
false, docValues=true|false. The Solr document sent by SolrJ is simply
thrown away after processing into segment files.

If you're sending semi-structured docs (say Word, PDF etc) to be
indexed through Tika they are simply transformed into a Solr doc (set
of field/value pairs) and the original document is thrown away as
well. There's no option to store the original semi-structured doc


On Mon, Nov 14, 2016 at 12:35 PM, Prateek Jain J
<> wrote:
> By data, I mean documents which are to be indexed. Some fields can be stored="true" but
that doesn’t matter.
> For example: App1 creates an object (AppObj) to be indexed and sends it to SOLR via solrj.
Some of the attributes of this object can be declared to be used for storage.
> Now, my understanding is data and indexes generated on data are two separate things.
In my particular example, all fields have stored="true" but only selected fields have indexed="true".
My expectation is, indexes are stored separately from data because indexes can be generated
by different techniques/algorithms but data/documents remain unchanged. Please correct me
if my understanding is not correct.
> Regards,
> Prateek Jain
> -----Original Message-----
> From: Erick Erickson []
> Sent: 14 November 2016 07:05 PM
> To: solr-user <>
> Subject: Re: index and data directories
> The question is pretty opaque. What do you mean by "data" as opposed to "indexes"? Are
you talking about where Lucene puts stored="true"
> fields? If not, what do you mean by "data"?
> If you are talking about where Lucene puts the stored="true" bits the no, there's no
way to segregate that our from the other files that make up a segment.
> Best,
> Erick
> On Mon, Nov 14, 2016 at 7:58 AM, Prateek Jain J <> wrote:
>> Hi Alex,
>>  I am unable to get it correctly. Is it possible to store indexes and data separately?
>> Regards,
>> Prateek Jain
>> -----Original Message-----
>> From: Alexandre Rafalovitch []
>> Sent: 14 November 2016 03:53 PM
>> To: solr-user <>
>> Subject: Re: index and data directories
>> solr.xml also has a bunch of properties under the core tag:
>>   <cores adminPath="/admin/cores">
>>     <core name="core0" instanceDir="core0">
>>       <property name="dataDir" value="/data/core0"/></core>
>>     <core name="core1" instanceDir="core1"/>
>>   </cores>
>> You can get the Reference Guide for your specific version here:
>> Regards,
>>    Alex.
>> ----
>> Solr Example reading group is starting November 2016, join us at
Newsletter and resources for Solr beginners and intermediates:
>> On 15 November 2016 at 02:37, Prateek Jain J <>
>>> Hi All,
>>> We are using solr 4.8.1 and would like to know if it is possible to
>>> store data and indexes in separate directories? I know following tag
>>> exist in solrconfig.xml file
>>> <!-- Data Directory Used to specify an alternate directory to hold all index
>>>                                 data other than the default ./data under the
Solr home. If replication is
>>>                                 in use, this should match the replication configuration.
>>>                 <dataDir>C:/del-it/solr/cm_events_nbi/data</dataDir>
>>> Regards,
>>> Prateek Jain

View raw message