Hello again,
The file are local, sorry for using the confusing /mounts...., I can see
where that is confusing.
What exactly is a RAMDirectory, I didn't see it mentioned on that page. Is
there example code of using it? Do I just create a Ram Directory and then
use it like it's a normal directory?
--JP
On 8/13/07, Kai Hu <kai.hu@dusee.cn> wrote:
>
> Hi, John
> I think you cost too much time in I/O,and if you use RAMDirectory
> first will better.see
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> kai
>
> -----邮件原件-----
> 发件人: Erick Erickson [mailto:erickerickson@gmail.com]
> 发送时间: 2007年8月13日 星期一 1:57
> 收件人: java-user@lucene.apache.org
> 主题: Re: Indexing correctly?
>
> Where are your source files and index? If they're somewhere
> out there on the network, you may be having some slowdown
> because of network latency (the part about "/mount/....." leads
> me to ask this one).
>
> If this is the case, you might get an improvement if all the files are
> local...
>
> Best
> Erick
>
> On 8/11/07, John Paul Sondag <jsondag2@uiuc.edu> wrote:
> >
> > It takes roughly 6 hours for me to index a Gig of data. The benchmarks
> > take
> > quite a bit less if I'm reading it correctly. I'll try out the
> > StringBuffer/Builder and let you know. Thanks for the quick response
> and
> > if
> > you have any more suggestions please let me know.
> >
> > --JP
> >
> > On 8/11/07, karl wettin <karl.wettin@gmail.com> wrote:
> > >
> > > How much slower than anticipated is it?
> > >
> > > I would start by using a StringBuffer/Builder rather than appending
> > > (immutable) strings to each other.
> > >
> > >
> > > 11 aug 2007 kl. 19.05 skrev John Paul Sondag:
> > >
> > > > Hi,
> > > >
> > > > I was hoping that maybe you guys could see if I'm somehow indexing
> > > > inefficiently. I'm putting relevant parts of my code below. I've
> > > > looked at
> > > > the "benchmarks" page on Lucene and my indexing time is taking a
> > > > substantial
> > > > amount of time more than what I see posted. I'm not sure when I
> > > > should call
> > > > flush() ( I saw that I should be doing that on the
> > > > ImproveIndexingSpeed
> > > > page). I'd really appreciate any advice.
> > > >
> > > > Here's my code:
> > > >
> > > > File directory = new File( "/mounts/falcon5/disks/0/tcheng3/
> > > > Dataset");
> > > > File[] theFiles = directory.listFiles();
> > > >
> > > > //go through each file inside the directory and index it
> > > > for(int curFile = 0; curFile < theFiles.length; curFile++)
> > > > {
> > > > File fin=theFiles[curFile];
> > > >
> > > > //open up the file
> > > > FileInputStream inf = new FileInputStream(fin);
> > > > InputStreamReader isr = new InputStreamReader(inf,
> > > > "US-ASCII");
> > > > BufferedReader in = new BufferedReader(isr);
> > > > String text="";
> > > > String docid="";
> > > >
> > > > while (true) {
> > > >
> > > > //read in the file one line at a time, and act
> accordingly
> > > > String line = in.readLine();
> > > > if (line == null) { break;}
> > > >
> > > > if (line.startsWith("<DOC>") ) {
> > > > //get docID
> > > > line = in.readLine();
> > > > String tempStr = line.substring(8,line.length
> ());
> > > > int pos = tempStr.indexOf(' ');
> > > > docid = tempStr.substring(0,pos);
> > > > }else if (line.startsWith("</DOC>")) {
> > > >
> > > > Document doc = new Document();
> > > >
> > > > doc.add(new Field("contents",text,
> > > > Field.Store.NO,
> > > > Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS ));
> > > > doc.add(new Field("DocID",docid, Field.Store.YES
> ,
> > > > Field.Index.NO));
> > > > writer.addDocument(doc);
> > > > text="";
> > > > } else {
> > > > text = text + "\n" + line;
> > > > }
> > > > }
> > > >
> > > > }
> > > >
> > > >
> > > > int numIndexed = writer.docCount();
> > > >
> > > > writer.optimize();
> > > > writer.close();
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > --JP
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
|