lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From KR <Keith.Rhodes....@siemens.com>
Subject Re: Create and populate a field when indexing
Date Fri, 09 Nov 2007 14:24:31 GMT


Grant Ingersoll-6 wrote:
> 
> When you are indexing the file and adding the Document, you will need  
> to parse out your filename per your regular expression, and then  
> create the appropriate field:
> 
> Document doc = new Document()
> String cat = getCategoryFromFileName(inputFileName)
> doc.add(new Field("category", cat, ...)
> //do the rest of your adds
> 
> Just locate where in the demo the Document add is taking place (I  
> forget the exact spot) and then add in the appropriate stuff from  
> above.  Obviously, you need to implement the method I stubbed called  
> getCategoryFromFileName.
> 
> HTH,
> Grant
> 

Thanks, Grant. That was just the hint I needed.

I found that the fields are populated in HTMLDocument.

I added:

doc.add(new Field("category", "test", Field.Store.YES,
Field.Index.TOKENIZED));

and then used Luke to verify that this field had been added. It had.

Now I am trying to get a quick-and-dirty way of setting the field based on
the filename, but I'm running into problems that I don't really understand
well enough to fix quickly.

I have only very limited experience of Java programming, so I might be using
the wrong terms, but I think the problem is variable scope. I get a
compilation error:

HTMLDocument.java:86: cannot find symbol
symbol  : variable url
location: class org.apache.lucene.demo.HTMLDocument
        if (url.indexOf("-ov-") != -1) {


I thought I'd be able to use a simple mechanism based on indexOf() to check
the existence of a short sequence of characters within the filename. For
example, "-sys-". I know that this sequence, if it exists anywhere in the
full path must be in the filename.

So I put in a series of if statements like this:

	if (url.indexOf("-sys-") != -1) {
		string category = "system";
	}

then right at the end:
doc.add(new Field("category", category, Field.Store.YES,
Field.Index.TOKENIZED));

Am I right in thinking that the variable url is undefined at this point in
the code? It certainly seems to be defined earlier on in the file:

  public static String uid2url(String uid) {
    String url = uid.replace('\u0000', '/');	  // replace nulls with slashes
    return url.substring(0, url.lastIndexOf('/')); // remove date from end
  }

Is there some way for me to perhaps chop down to the filename here, and make
that available later in the code?

K.
-- 
View this message in context: http://www.nabble.com/Create-and-populate-a-field-when-indexing-tf4713018.html#a13667927
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message