lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem OXSEED <>
Subject Re: Upgrading indexes from Solr 1.4.1 to 4.1.0
Date Wed, 06 Feb 2013 15:12:41 GMT
It turns out that all our fields are stored and restoring from the 
source data is a bit of problem. I've tried DIH/SorEntityProcessor and 
it seems to be working out good, so I'll probably end up using it.
Thank you!

Warm regards,
Artem Karpenko

On 04.02.2013 19:58, Lance Norskog wrote:
> A side problem here is text analyzers: the analyzers have changed how
> they split apart text for searching, and are matched pairs. That is, the
> analyzer queries are created matching what the analyzer did when
> indexing. If you do this binary upgrade sequence, the indexed data will
> not match what the analyzers do. It is not a major problem, but queries
> will not bring back what you expect.
> Also, in 4.x, the unique field has to be called 'id' and every document
> needs a '_version_' field.
> On 02/04/2013 09:32 AM, Upayavira wrote:
>> Just to add a little to the good stuff Shawn has shared here - Solr 4.1
>> does not support 1.4.1 indexes. If you cannot re-index (by far
>> recommended), then first upgrade to 3.6, then optimize your index, which
>> will convert it to 3.6 format. Then you will be able to use that index
>> in 4.1. The simple logic here is that Solr/Lucene can read the indexes
>> of the previous major version. Given you are two major versions behind,
>> you'd have to do it in two steps.
>> Upayavira
>> On Mon, Feb 4, 2013, at 03:18 PM, Shawn Heisey wrote:
>>> On 2/4/2013 7:20 AM, Artem OXSEED wrote:
>>>> I need to upgrade our Solr installation from 1.4.1 to the latest 4.1.0
>>>> version. The question is how to deal with indexes. AFAIU there are two
>>>> things to be aware of: file format and index format (excuse me for
>>>> possible term mismatch, I'm new to Solr) - and while file format can
>>>> (and will automatically?) be updated if old index files are used by new
>>>> Solr installation, one cannot say the same about index format. Is it true?
>>>> And if the above is true then the question is - should this "index
>>>> format" be updated at all - i.e. if we can happily live with it then
>>>> it's fine, but I guess that this decision will not bring
>>>> performance/feature improvements that were introduced since 1.4.1
>>>> version, will it?
>>>> Assuming we do need to update this "index format", how to do it? I found
>>>> solution on SO
>>>> (
>>>> that includes usage of some "export to XML" feature, maybe with Luke,
>>>> some custom-made XSLT transformation and import back. Seems like a lot
>>>> to do - although it's quite understandable. However, this answer was
>>>> given in 2010 with Solr 4.0 being in pre-alpha - so maybe there are now
>>>> tools for this now?
>>> Artem,
>>> When upgrading Solr, the absolute best option is always to delete (or
>>> move) your index directory, let the new version recreate it, and rebuild
>>> from scratch by reindexing from your original data source.  This should
>>> always remain an option - the indexes may get corrupted by an unexpected
>>> situation.  If you have the ability to rebuild your 1.4.1 index from
>>> your original data source, then it should be straightforward to do the
>>> same thing on the new version.
>>> Solr 4.1 can read version 3.x indexes, but I would not be surprised to
>>> find that it can't read the Lucene 2.9.x format that Solr 1.4.1 uses.  I
>>> don't know how much difference there is between the 2.9.x format and the
>>> 3.x format.  I'm not aware of a distinction between "file" and "index"
>>> formats.
>>> If a Solr version supports an older format, then it will read the
>>> segments created in that format, but new segments will be in the new
>>> format.  Solr/Lucene index segments on disk are never changed once they
>>> are finalized.  They can be merged into new segments and then deleted,
>>> but nothing will ever change them.
>>> Have you stored every single field individually in Solr?  If you have,
>>> then you will be able to retrieve the data to reindex into the new
>>> version.  If you have fields that are indexed but not stored, then even
>>> with the XML method you will be unable to obtain all the data.  It is
>>> fairly normal in a Solr schema to have fields that you can search on but
>>> that are not stored, because stored fields make the index larger.
>>> If you have stored every single field in your index, you can also use
>>> the SolrEntityProcessor in the dataimport handler to import from an old
>>> Solr server to a new one.
>>> The critical piece of the puzzle for upgrading between incompatible
>>> versions is that you must be storing every field in the old version
>>> before you start.  If you aren't storing a particular field, then the
>>> data from that field is not retrievable and you must go back to the
>>> original data source.
>>> Thanks,
>>> Shawn

View raw message