lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Index splitting
Date Tue, 29 Apr 2008 17:10:31 GMT
Hi Nico,

I don't think there is a tool to split an existing Lucene index, though I imagine one could
write such a tool using http://lucene.apache.org/java/2_3_1/fileformats.html as a guide.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: Nico Heid <nico.heid@gmx.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, April 29, 2008 4:10:09 AM
> Subject: Index splitting
> 
> Hi,
> Let me first roughly describe the scenario :-)
> 
> We're trying to index online stored data for some thousand users.
> The schema.xml has a custom identifier for the user, so FQ can be applied
> and further filtering is only done for the user (more important, the user
> doesn't get to see results from data not belonging to him)
> 
> Unfortunatelly, the Index might become quite big ( we're indexing more that
> 50 TB Data, all kind of files, full text (indexed only, not stored) where
> possible, elsewhere fileinfos (size, date) and meta if available)
> 
> So Question the is:
> 
> We're thinking of starting out with multiple Solr instances (either in their
> own containers or MultiCore, guess that's not the important point), on 1 to
> n machines. Lets just pretend: we do modulo 5 on the user number and assign
> it to one of the two machines. The index gets distributed on QuerySlaves (
> 1-m dependend on the need).
> 
> So now the Question:
> Is there a way to split a too big index into smaller ones? Do I have to
> create more instances at the beginning, so that I will not run out of power
> and space? (which will ad quite a bit of redundance of data)
> Lets say I miscalculated and used only 2 indices, but now I see I need at
> least 4.
> 
> Any idea will be very welcome,
> 
> Thanks,
> Nico
> 
> 
> 



Mime
View raw message