lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: merge factor and real time indexing
Date Tue, 30 May 2006 15:03:01 GMT
John,
I can't spot anything wrong in the code.  I assume you are certain there are no IOException,
no issues with locks, and that your writer is really getting close()d.  I haven't used IndexModifier,
and since that's a layer on top of the real IndexWriter with the same API exposed, I would
just use IndexWriter directly and see if that makes a difference.  If it does, then maybe
there is a bug in IndexModifier.

Otis

----- Original Message ----
From: John Wang <john.wang@gmail.com>
To: java-user@lucene.apache.org
Sent: Tuesday, May 30, 2006 5:45:39 AM
Subject: merge factor and real time indexing

Hi folks:

    I am working on an application that requires real time indexing, e.g.
for every insert, I open the writer, add a document and then closes the
writer.

    I want to control the number of files created, and according to the
documentation, a small mergeFactor is desired. However, I am experiencing
the opposite, see the following code segment:

public static void main(String[] args) throws IOException{
        int mfactor=10;
        int mbuffer=1000;


        IndexModifier writer=null;
        File dir=new File("/tmp/john/");

        long start=System.currentTimeMillis();
        for (int i=0;i<5000;++i){
            try{
                boolean create=!IndexReader.indexExists(dir);
                writer=new IndexModifier(dir,new StandardAnalyzer(),create);
                writer.setMergeFactor(mfactor);
                writer.setMaxBufferedDocs(mbuffer);
                    Document doc=new Document();
                    doc.add(new Field("test","this is a test doc",
Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES));
                    writer.addDocument(doc);
            }
            finally{
                if (writer!=null){
                    writer.close();
                }
            }

        }
        long end=System.currentTimeMillis();

        System.out.println("took: "+(end-start));

    }

If I set the mfactor value to a high number, e.g. 1000, indexing takes much
longer but the number of files decreases dramatically.

Is this expected or are there any better ways of tuning the indexing
parameters so that I limit the number of open files while gettting a decent
indexing speed?

Thanks

-John




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message