lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Bugger" <john.bug...@gmail.com>
Subject Re: Search in HTML code
Date Tue, 03 Oct 2006 14:49:10 GMT
My crawler indexing crawled pages with these code:
Document doc = new Document();
doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED
));
doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED));
doc.add(new Field("title", page.getTitle(), Store.YES, Index.TOKENIZED));
doc.add(new Field("id", Integer.toString(page.getId()), Store.YES, Index.NO
));
try {
    indexWriter.addDocument(doc);
}
catch (Exception e) {
    log.error(e.getMessage());
}

I need to write application able to search through indexed pages' html code
using code patterns like:
<table width="100%" height="50" style="border: 1px solid red;">
  *
  <th>*test*</th>
  *
</table>
This should match all documents with such code regardless of order of tag
parameters.
Is it possible with lucene engine?

Thanks!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message