lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bo_b>
Subject Optimizing a schema
Date Tue, 08 Aug 2006 08:46:59 GMT


I have tried indexing a vbulletin message board, containing roughly 7
million posts.

My schema is as follows:

   <field name="postid" type="int" indexed="true" stored="true" />
   <field name="threadid" type="int" indexed="false" stored="true" />
   <field name="username" type="string" indexed="false" stored="true" />
   <field name="title" type="string" indexed="false" stored="true" />
   <field name="teaser" type="string" indexed="false" stored="true" />
   <field name="date" type="date" indexed="true" stored="true"
   <field name="blob" type="text" indexed="true" stored="false"
multiValued="true" omitNorms="true"/>


   <copyField source="username" dest="blob"/>
   <copyField source="title" dest="blob"/>

I am trying to figure out if there is anything I can do to lower the disk
usage and or increase sorting speed before we go live with the search. So a
few questions came to mind

1) Sorting I was planning to do on the date field(aka add "; date desc").
But I was wondering if it would be more efficient to sort on postid
instead(since higher postid in vbulletin=newer post). I already have
indexed=true for postid since its our unique field, but then i could set
indexed=false for date, and perhaps save some storage space?

2) If we sort on postid instead, would we need to use integer, or the sint
type? I assume sint would be faster(?) but perhaps use more storage?

3) About Omitnorms=true, I must admit i dont exactly understand what it does
:) But I read that it would save 1 byte pr document. Are the any other
fields I need to add it to in my schema? As far as I understand
Omitnorms=true only makes a difference for indexed=true fields, and doesnt
do anything for int fields?

Thanks in advance for any suggestions :)

View this message in context:
Sent from the Solr - User forum at

View raw message