lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: problem in indexing documents
Date Tue, 25 Dec 2007 22:37:15 GMT
It's more helpful to indicate what error you're receiving or
what your perceived problem is. Without that, we can only
guess...

But one thing wrong is that you keep appending to the
same StringBuffer forever, so your first writer.AddDocument
adds  document 1. Your second adds the text of BOTH doc
1 and doc 2. Your third AddDocument adds the contents of
docs 1, 2, 3. Etc.....

Just move the declaration for sb inside your for loop....

Note two things:
1> you can get the same effect by doing a
    document.add() for *each* line. That is,
    instead of the sb.append line, just do a
    document.add() for line. Then do your
    writer.addDocument outside the while loop as
    you do now.
2> there is a built-in default maximum of 10,000 terms that get
    indexed. You can change this, see SetMaxFieldLength on
    IndexWriter.

Best
Erick

On Dec 25, 2007 4:02 PM, Liaqat Ali <liaqatalimian@gmail.com> wrote:

> hello,
>
> I am try to make an index of 191 documents stored in 191 text files. I
> developed a program, which works well with files containing single line,
> but files with multiple lines posing a problem.So i added while loop to
> completely extract data from each document. But it has some logical
> error. Well the given code is an right approach to my problem? Kindly
> give some guidelines.
>
>
>
>
> StringBuffer sb = new StringBuffer();
>
> Analyzer analyzer = new StandardAnalyzer();
>            boolean createFlag = true;
>        IndexWriter writer =
>                    new IndexWriter(indexDir, analyzer, createFlag);
>
>        for (int i=1;i<=191;i++)  {
>
>            Reader file = new InputStreamReader(new
> FileInputStream("corpus\\doc" + i + ".txt"), "UTF-8");
>
>
>            BufferedReader buff = new BufferedReader(file);
>
>            while( (line = buff.readLine()) != null) {
>                        sb.append(line);
>                }
>
>                Document document  = new Document();
>            document.add(new Field("contents",sb.toString(),
> Field.Store.NO, Field.Index.TOKENIZED));
>            writer.addDocument(document);
>
>            buff.close();
>
>        }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message