lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl wettin <karl.wet...@gmail.com>
Subject Re: Indexing correctly?
Date Sat, 11 Aug 2007 18:36:58 GMT
How much slower than anticipated is it?

I would start by using a StringBuffer/Builder rather than appending
(immutable) strings to each other.


11 aug 2007 kl. 19.05 skrev John Paul Sondag:

> Hi,
>
> I was hoping that maybe you guys could see if I'm somehow indexing
> inefficiently.  I'm putting relevant parts of my code below.  I've  
> looked at
> the "benchmarks" page on Lucene and my indexing time is taking a  
> substantial
> amount of time more than what I see posted.  I'm not sure when I  
> should call
> flush() ( I saw that I should be doing that on the  
> ImproveIndexingSpeed
> page).  I'd really appreciate any advice.
>
> Here's my code:
>
>       File directory = new File( "/mounts/falcon5/disks/0/tcheng3/ 
> Dataset");
>       File[] theFiles = directory.listFiles();
>
>           //go through each file inside the directory and index it
>           for(int curFile = 0; curFile < theFiles.length; curFile++)
>           {
>               File fin=theFiles[curFile];
>
>               //open up the file
>               FileInputStream inf = new FileInputStream(fin);
>               InputStreamReader isr = new InputStreamReader(inf,
> "US-ASCII");
>               BufferedReader in = new BufferedReader(isr);
>               String text="";
>               String docid="";
>
>             while (true) {
>
>             //read in the file one line at a time, and act accordingly
>                 String line = in.readLine();
>                 if (line == null) { break;}
>
>                  if (line.startsWith("<DOC>") ) {
>                     //get docID
>                     line = in.readLine();
>                     String tempStr = line.substring(8,line.length());
>                     int pos = tempStr.indexOf(' ');
>                     docid = tempStr.substring(0,pos);
>                     }else if (line.startsWith("</DOC>")) {
>
>                     Document doc = new Document();
>
>                       doc.add(new Field("contents",text,  
> Field.Store.NO,
> Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS ));
>                     doc.add(new Field("DocID",docid, Field.Store.YES,
> Field.Index.NO));
>                       writer.addDocument(doc);
>                     text="";
>                 } else {
>                     text = text + "\n" + line;
>                 }
>             }
>
>         }
>
>
>           int numIndexed = writer.docCount();
>
>           writer.optimize();
>           writer.close();
>
>
> Thanks,
>
> --JP


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message