lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Valdivia <h...@danielvaldivia.com>
Subject How to escape URL at indexing time
Date Sun, 27 Dec 2015 20:53:26 GMT
Hi

I'm trying to index documents that have a URL in some field, however as soon as I try to index
a URL like "http://yahoo.com" I get error:

org.apache.lucene.queryparser.classic.ParseException: Cannot parse 'id:'http://www.yahoo.com'':
Encountered " ":" ": "" at line 1, column 8.

I asume I need to escape the URL, but not sure if encoding the URL is the right way to go.

my indexing code:

Document doc = new Document();

doc.add(new StringField("id", url, Field.Store.YES));
doc.add(new StringField("domain", domain, Field.Store.NO));
doc.add(new StringField("title", pageTitle, Field.Store.NO));
doc.add(new TextField("body", pageBody, Field.Store.NO));
w.addDocument(doc);

Any ideas on how I can avoid the parsing issue?

I’m using Lucene 5.4.0
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message