lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "peter velthuis" <peter...@gmail.com>
Subject Re: Indexing very slow.
Date Mon, 03 Jul 2006 10:37:04 GMT
I select it in parts, chunks of 5000  records with the limit keyword..
the thing is it starts very fast..but then slows down so i doubt it
has to do with tokenizing


2006/7/3, Aleksander M. Stensby <aleksander.stensby@integrasco.no>:
> My guess is if that you actually do a complete select * from you db, and
> manage all objects all at once, this will be a problem for your jvm, maybe
> running out of memory is the problem you encounter, strings tend to be a
> bit of a memory issue in java :(
>
> My suggestion is that you do paginating and offsetting while getting from
> db (and indexing the results).
>
> so.. you could managed your "last indexed" doc, predefine a "step"
> variable, then select with a limit of lastindexed, step, then updating
> your last indexed variable for each iteration. Of course the step variable
> should not be too low either, because this will demand to much load on the
> database connection.
>
> Also, you might reconsider tokenizing all fields... is it neccessary? You
> have to store all of them? (i dont know the usage of your index, so its a
> bit hard to know for sure)
>
> - Aleksander
>
> On Mon, 03 Jul 2006 10:52:15 +0200, peter velthuis <peterv85@gmail.com>
> wrote:
>
> > When i start the program its fast.. about 10 docs per second. but
> > after about 15000 it slows down very much. Now it does 1 doc per
> > second and it is at nr# 40 000 after a whole night indexing. These are
> > VERY small docs with very little information.. THis is what and how i
> > index it:
> >
> >       Document doc = new Document();
> >                                  doc.add(new Field("field1", field1,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                  doc.add(new Field("field2", field2,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                  doc.add(new Field("field3", field3,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                  doc.add(new Field("field4", field4,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                  doc.add(new Field("field5", field5,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                 doc.add(new Field("field6", field6,
> > Field.Store.YES,
> >                         Field.Index.TOKENIZED));
> >                                 doc.add(new Field("contents",
> > contents, Field.Store.NO,
> >                         Field.Index.TOKENIZED));
> >
> >
> >
> > and this:
> >
> >
> >     String indexDirectory = "lucdex2";
> >
> >     private void indexDocument(Document document) throws Exception {
> >         Analyzer analyzer  = new StandardAnalyzer();
> >         IndexWriter writer = new IndexWriter(indexDirectory, analyzer,
> > false);
> >       //  writer.setUseCompoundFile(true);
> >         writer.addDocument(document);
> >         writer.optimize();
> >         writer.close();
> >
> >
> >
> > I read the data out mysql database.. but that cant be the problem..
> > since data is in memory.
> >
> > Also i use cygwin, when i try indexing on windows in a program like
> > netbeans or BlueJ it crashes windows after about 5000 docs. it sais
> > "beep" and a complete shutdown...
> >
> > Peter
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
>
> --
> Aleksander M. Stensby
> Software Developer
> Integrasco A/S
> aleksander.stensby@integrasco.no
> Tlf.: +47 41 22 82 72
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message