lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: Indexing Tips and Hints
Date Tue, 25 Feb 2003 02:23:06 GMT
Hi Andrzej,

Thanks for the code.  I'll try it as soon as I have time.  If you had a copy
of the modified FSDirectory implementation you could also share, that would
make testing it a bit quicker and easier.  BTW, when you said it "supposedly
increases I/O", I gather that you are not the author?

Regards,

Terry

----- Original Message -----
From: "Andrzej Bialecki" <ab@getopt.org>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, February 24, 2003 3:59 PM
Subject: Re: Indexing Tips and Hints


> Hello,
>
> Since you are trying this anyway, and looking for ways to improve
> indexing times... Could you perhaps try to replace use of
> java.io.RandomAccessFile in FSDirectory implementation, with the
> attached implementation? It supposedly increases I/O throughput by
> orders of magnitude, by using partial buffering.
>
> Terry Steichen wrote:
> > Mike,
> >
> > By way of comparison, I've got a collection of about 50,000 XML files,
each
> > of which averages about 8K.  It takes about 1.25 hours to index (on a
1.8Ghz
> > machine).  I use basically the standard configuration (mergeFactor,
etc.)
> > and I've got about 30 fields per document.  I add about 200 new ones per
> > day.  I don't recall how long that it takes to index the 200 (I do it
> > through a background task), but it takes a couple of minutes to merge
the
> > new 200 document index with the master index.
> >
> > HTH,
> >
> > Terry
> >
> > ----- Original Message -----
> > From: "Michael Barry" <mbarry@cos.com>
> > To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> > Sent: Monday, February 24, 2003 2:00 PM
> > Subject: Indexing Tips and Hints
> >
> >
> >
> >>All,
> >>   I'm in need of some pointers, hints or tips on indexing large
> >
> > collections
> >
> >>of data. I know I saw some tips on this list before but when I tried
> >>searching
> >>the list, I came up blank.
> >>   I have a large collection of XML files (336000 files around 5K
> >>apiece) that I'm
> >>indexing and its taking quite a bit of time (27 hours). I've played
> >>around with the
> >>mergeFactor, RAMDirectories and multiple threads (X number of threads
> >>indexing
> >>a subset of the data and then merging the indexes at the end) but I
> >>cannot seem
> >>to bring the time down. I'm probably not doing these things properly but
> >>from
> >>what I read I believe I am.  Maybe this is the best I can do with this
> >>data but I
> >>would be really grateful to hear how others have tackled this same
issue.
> >>   As always pointers to places in the mailing list archive or other
> >>places would be
> >>appreciated.
> >>
> >>Thanks, Mike.
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >>
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> >
>
>
> --
>
> --
> Best regards,
> Andrzej Bialecki
>
> -------------------------------------------------
> Software Architect, System Integration Specialist
> -------------------------------------------------
> FreeBSD developer (http://www.freebsd.org)
>
>


----------------------------------------------------------------------------
----


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message