lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: HTML pages highlighter
Date Fri, 01 Apr 2005 01:03:38 GMT

On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
>     try {
>       fis = new FileInputStream(f);
>       HTMLParser parser = new HTMLParser(fis);
>
>       // Add the tag-stripped contents as a Reader-valued Text field 
> so it will
>       // get tokenized and indexed.
> //      doc.add(new Field("contents", parser.getReader()));
>       LineNumberReader reader = new 
> LineNumberReader(parser.getReader());
>       for (String l = reader.readLine(); l != null; l = 
> reader.readLine())
> //        System.out.println(l);
>       doc.add(Field.Text("contents", l));

Notice that your loop here is adding a "contents" field for *every* 
line read since that is where the first semi-colon is.

Look at using Luke to explore your index.  Try indexing just a dummy 
String:

	doc.add(Field.Text("contents", "some dummy text"));

to show that it works.  Always always always simplify a complicated 
situation by doing the most obvious thing that _should_ work.

Also, the demo Lucene code is not really designed to be used in a 
production application (sadly), so you're better off borrowing code 
from the many articles or our book to begin with.

	Erik


>
>       // Add the summary as a field that is stored and returned with
>       // hit documents for display.
>       doc.add(new Field("summary", parser.getSummary(), 
> Field.Store.YES, Field.Index.NO));
>
>       // Add the title as a field that it can be searched and that is 
> stored.
>       doc.add(new Field("title", parser.getTitle(), Field.Store.YES, 
> Field.Index.TOKENIZED));
>     }
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 30, 2005 7:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: HTML pages highlighter
>
>
>
> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
>
>> Hi! Eric,
>
> Erik - with a 'k' - Sorry, I let it slide once though :)
>
>> 	I try to modified that with this but I get compile error. Do you have
>> any code snippet of highlighting code to pull the contents from the
>> original source?
>
> I have a whole book full of code examples :)
> http://www.lucenebook.com - Grab the source code and look in
> src/lia/tools at Highlight*.java
>
>>  or Do you know how I can do field store?
>>
>>       doc.add(new Field("contents", parser.getReader(),
>> Field.Store.YES, Field.Index.NO));
>
> You cannot store it with a Reader.  You need to use Field.Text(String,
> String), or one of the other variations.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message