lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Connor <ian.con...@gmail.com>
Subject Re: fastest way to index/reindex
Date Mon, 26 Jan 2009 16:52:44 GMT
I have about 2.5 million per shard and seem to be getting through 28/sec
using a 1000 at a time. It ran all yesterday and part of the night. It is
over the 1.6 million mark now so hope it can keep up a similar rate as it
gets deeper into the index.

I need to reindex it all because I changed how some of the fields are
indexed (not stored). So the stored data is fine - I just needed the index
to change how the index filter was applied.

This is probably going to be an iterative process so reindex all will not be
an unusual event as I find optimizations and new ways to do things so I
wanted it to be as painless as possible.

id:[*TO*] seems okay but wanted to know if there was a smarter way.

On Mon, Jan 26, 2009 at 3:11 AM, Julian Davchev <jmut@drun.net> wrote:

> I kinda don't get why would you reindex all data at once?
> Each document has unique id.... you will reindex only whats needed. Also
> if too many stuff I'd suggest using some
> batch processor that will add N tasks with range query 1:10  10:20
> etc... and cronjob executing those. Thousends seems ok but when you hit
> millions you're in trouble.
> Cheers.
>
> Ryan McKinley wrote:
> > I don't know of any standard export/import tool -- i think luke has
> > something, but it will be faster if you write your own.
> >
> > Rather then id:[* TO *], just try *:*  -- this should match all
> > documents without using a range query.
> >
> >
> > On Jan 25, 2009, at 3:16 PM, Ian Connor wrote:
> >
> >> Hi,
> >>
> >> Given the only real way to reindex is to save the document again,
> >> what is
> >> the fastest way to extract all the documents from a solr index to resave
> >> them.
> >>
> >> I have tried the id:[* TO *] trick however, it takes a while once you
> >> get a
> >> few thousand into the index. Are there any tools that will quickly
> >> export
> >> the index to a text file or making queries 1000 at a time is the best
> >> option
> >> and dealing with the time it takes to query once you are deep into the
> >> index?
> >>
> >> --
> >> Regards,
> >>
> >> Ian Connor
> >
>
>


-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message