lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jitender ahuja" <>
Subject Reader Text input as field for HTML data text leading to "null" retrieval
Date Mon, 15 Mar 2004 10:31:09 GMT
I am working to make an index using Lucene over HTML files. I intend to use the Reader as the
type of the text field so as to not store the Html files verbatim in the index. But the data
retrieval yields null as the text retrieved.

However, if I do not use the Reader class as the Text field type, then I get whole file back
.Also, the index directory size is nearly four times more now.


The indexer code that deals with the Reader data type is:


<p> public class IndexData{

<p> protected static final String INDEX_FOLDER = "C:\\Temp\\DB_GT11";

<pre>public static void main(String[] args)



              IndexData objDBdex = new IndexData();

              boolean createDex = !objDBdex.indexExists();


     <p>IndexWriter writ = new IndexWriter(INDEX_FOLDER, new StandardAnalyzer(), createDex);


         for(int i=0; i<args.length; i++){

        System.out.println("Indexing File" +args[i]);

        InputStream is = new FileInputStream(args[i]);

        Document doc = new Document();

       doc.add(Field.UnIndexed("path", args[i]));</pre>

      <p> BufferedReader rdr = new BufferedReader((Reader)new InputStreamReader(is));


      StringBuffer fileBuffer = new StringBuffer();

     String line;

     while ((line = rdr.readLine()) != null ) {



      System.out.println("File contents from buffer: ");


      StringReader ab = new StringReader(fileBuffer.toString());

      doc.add(Field.Text("body", (Reader)ab));






     catch(IOException ex) {




   public boolean indexExists(){

        return false;




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message