lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Paul Sondag" <jsond...@uiuc.edu>
Subject Re: 答复: Indexing correctly?
Date Tue, 14 Aug 2007 19:34:08 GMT
Hello again,

The file are local, sorry for using the confusing /mounts...., I can see
where that is confusing.

What exactly is a RAMDirectory, I didn't see it mentioned on that page.  Is
there example code of using it?   Do I just create a Ram Directory and then
use it like it's a normal directory?

--JP

On 8/13/07, Kai Hu <kai.hu@dusee.cn> wrote:
>
> Hi, John
>         I think you cost too much time in I/O,and if you use RAMDirectory
> first will better.see
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> kai
>
> -----邮件原件-----
> 发件人: Erick Erickson [mailto:erickerickson@gmail.com]
> 发送时间: 2007年8月13日 星期一 1:57
> 收件人: java-user@lucene.apache.org
> 主题: Re: Indexing correctly?
>
> Where are your source files and index? If they're somewhere
> out there on the network, you may be having some slowdown
> because of network latency (the part about "/mount/....." leads
> me to ask this one).
>
> If this is the case, you might get an improvement if all the files are
> local...
>
> Best
> Erick
>
> On 8/11/07, John Paul Sondag <jsondag2@uiuc.edu> wrote:
> >
> > It takes roughly 6 hours for me to index a Gig of data.  The benchmarks
> > take
> > quite a bit less if I'm reading it correctly.  I'll try out the
> > StringBuffer/Builder and let you know.  Thanks for the quick response
> and
> > if
> > you have any more suggestions please let me know.
> >
> > --JP
> >
> > On 8/11/07, karl wettin <karl.wettin@gmail.com> wrote:
> > >
> > > How much slower than anticipated is it?
> > >
> > > I would start by using a StringBuffer/Builder rather than appending
> > > (immutable) strings to each other.
> > >
> > >
> > > 11 aug 2007 kl. 19.05 skrev John Paul Sondag:
> > >
> > > > Hi,
> > > >
> > > > I was hoping that maybe you guys could see if I'm somehow indexing
> > > > inefficiently.  I'm putting relevant parts of my code below.  I've
> > > > looked at
> > > > the "benchmarks" page on Lucene and my indexing time is taking a
> > > > substantial
> > > > amount of time more than what I see posted.  I'm not sure when I
> > > > should call
> > > > flush() ( I saw that I should be doing that on the
> > > > ImproveIndexingSpeed
> > > > page).  I'd really appreciate any advice.
> > > >
> > > > Here's my code:
> > > >
> > > >       File directory = new File( "/mounts/falcon5/disks/0/tcheng3/
> > > > Dataset");
> > > >       File[] theFiles = directory.listFiles();
> > > >
> > > >           //go through each file inside the directory and index it
> > > >           for(int curFile = 0; curFile < theFiles.length; curFile++)
> > > >           {
> > > >               File fin=theFiles[curFile];
> > > >
> > > >               //open up the file
> > > >               FileInputStream inf = new FileInputStream(fin);
> > > >               InputStreamReader isr = new InputStreamReader(inf,
> > > > "US-ASCII");
> > > >               BufferedReader in = new BufferedReader(isr);
> > > >               String text="";
> > > >               String docid="";
> > > >
> > > >             while (true) {
> > > >
> > > >             //read in the file one line at a time, and act
> accordingly
> > > >                 String line = in.readLine();
> > > >                 if (line == null) { break;}
> > > >
> > > >                  if (line.startsWith("<DOC>") ) {
> > > >                     //get docID
> > > >                     line = in.readLine();
> > > >                     String tempStr = line.substring(8,line.length
> ());
> > > >                     int pos = tempStr.indexOf(' ');
> > > >                     docid = tempStr.substring(0,pos);
> > > >                     }else if (line.startsWith("</DOC>")) {
> > > >
> > > >                     Document doc = new Document();
> > > >
> > > >                       doc.add(new Field("contents",text,
> > > > Field.Store.NO,
> > > > Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS ));
> > > >                     doc.add(new Field("DocID",docid, Field.Store.YES
> ,
> > > > Field.Index.NO));
> > > >                       writer.addDocument(doc);
> > > >                     text="";
> > > >                 } else {
> > > >                     text = text + "\n" + line;
> > > >                 }
> > > >             }
> > > >
> > > >         }
> > > >
> > > >
> > > >           int numIndexed = writer.docCount();
> > > >
> > > >           writer.optimize();
> > > >           writer.close();
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > --JP
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message