lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Wang" <john.w...@gmail.com>
Subject merge factor and real time indexing
Date Tue, 30 May 2006 12:45:39 GMT
Hi folks:

    I am working on an application that requires real time indexing, e.g.
for every insert, I open the writer, add a document and then closes the
writer.

    I want to control the number of files created, and according to the
documentation, a small mergeFactor is desired. However, I am experiencing
the opposite, see the following code segment:

public static void main(String[] args) throws IOException{
        int mfactor=10;
        int mbuffer=1000;


        IndexModifier writer=null;
        File dir=new File("/tmp/john/");

        long start=System.currentTimeMillis();
        for (int i=0;i<5000;++i){
            try{
                boolean create=!IndexReader.indexExists(dir);
                writer=new IndexModifier(dir,new StandardAnalyzer(),create);
                writer.setMergeFactor(mfactor);
                writer.setMaxBufferedDocs(mbuffer);
                    Document doc=new Document();
                    doc.add(new Field("test","this is a test doc",
Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.YES));
                    writer.addDocument(doc);
            }
            finally{
                if (writer!=null){
                    writer.close();
                }
            }

        }
        long end=System.currentTimeMillis();

        System.out.println("took: "+(end-start));

    }

If I set the mfactor value to a high number, e.g. 1000, indexing takes much
longer but the number of files decreases dramatically.

Is this expected or are there any better ways of tuning the indexing
parameters so that I limit the number of open files while gettting a decent
indexing speed?

Thanks

-John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message