lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Rhodes <spamsu...@rhoderunner.com>
Subject Re: indexing and searching the document title question
Date Tue, 27 Feb 2007 22:07:44 GMT
I am confused.  I am following the faq that says indexing/searching a title of a document will
cause it be ranked higher.

When I do a search on the title of my document (name in my case), the document is being returned.
 But it does not get ranked higher, in fact, it gets buried in the results.

I am using the StandardAnalyser on both the indexing and the searching.
I added the NAME to my document as a tokenized field.
		document.add(new Field("NAME", "Color Me Mine", Field.Store.YES,
				Field.Index.TOKENIZED));
My queryparser uses the StandardAnalyzer and it is building the query on the correct field.
 Running this query will return 127 results with the matching document name at #40 in the
list.
NAME:"color me mine" (CONTENTS:color CONTENTS:me CONTENTS:mine)


If I run this query (NAME:"color me mine"), I will get my one match, but nothing else since
I am not searching the contents of the document, so I know the "name" query is returning a
record.


Can anyone think of anything else I can do to boost the results of a match on the NAME field?
I tried setting a boost on the "name" query, but it didn't work.  The documents were not returned
in any different order.
The query toString method returned:
NAME:"color me mine"^2.0 (CONTENTS:color CONTENTS:me CONTENTS:mine)

Thanks.

I am installing LUKE now...

The Name is definitely on my document:
Document<stored/uncompressed,indexed<OBJECT_ID:238173> stored/uncompressed<SNIPPET:Color
Me Mine is a friendly place where you can create your own artwork on the pottery piece of
your ...> stored/uncompressed,indexed<ATTRACTION_ID:238173> stored/uncompressed,indexed,tokenized<NAME:Color
Me Mine> stored/uncompressed,indexed<OBJECT_TYPE:com.reffects.dmi.dbom.Attraction>
stored/uncompressed,indexed,tokenized<CATEGORY_NAME:Arts & Entertainment> stored/uncompressed,indexed<CATEGORY_ID:29>
stored/uncompressed,indexed<OBJECT_TYPE:ACTIVITY> stored/uncompressed,indexed<ATTRACTION_TYPE:A>
stored/uncompressed,indexed<BLUE_RIBBON:false> stored/uncompressed,indexed<CABIN_FEVER:false>
stored/uncompressed,indexed<APPALACIAN:false> stored/uncompressed,indexed<WILD:false>
stored/uncompressed,indexed,tokenized<REGION_NAME:Pittsburgh and Its Countryside> stored/uncompressed,indexed<REGION_ID:4>
stored/uncompressed,indexed,tokenized<CITY:PITTSBURGH> stored/uncompressed,indexed<ZIP_CODE:15217>
stored/uncompressed,indexed<LATITUDE:1040.438167> stored/uncompressed,indexed<LONGITUDE:920.078858>
stored/uncompressed,indexed<HANDICAP_ACCESS:N> stored/uncompressed,indexed<SITE_ID:16651>
stored/uncompressed,indexed<SITE_ID:16501>>



----- Original Message -----
From: "Erick Erickson" <erickerickson@gmail.com>
To: java-user@lucene.apache.org
Sent: Tuesday, February 27, 2007 1:13:45 PM (GMT-0500) America/New_York
Subject: Re: indexing and searching the document title question

You've probably got it right. But I'd add a couple of things....

1> by using the correct analyzer at index and query time, the
casing will be taken care of for you.

2> you don't want UN_TOKENIZED for fields you search on
in general because there's no parsing. So if you indexed
"This is a String", searching on "This" or "this" wouldn't match.

3> In your code fragment, you didn't show what Analyzer you
use. This is way more important than you think.

4> get a copy of Luke (google lucene luke). It'll let you examine
your index and save you a world of hurt. There have been some
very nice improvements lately along with 2.1 compatability.

5> If you want searches and indexing to use different analyzers
on different fields, see PerFieldAnalyzerWrapper.

6> You'll probably find yourself storing the same data multiple
times, once for searching and once for displaying. So you'll search
on the lowercased, indexed field and display the UN_TOKENIZED
version since it'll retain the capitalization.

7> I think your underlying problem is that the syntax of the search
isn't correct. You're really searching on
NAME:color
defaultfield:me
defaultfield:mine

You want something like +NAME:color +NAME:me +NAME:mine

Best
Erick

On 2/27/07, Phillip Rhodes <spamsucks@rhoderunner.com> wrote:
>
> Hi,
> According to the FAQ, by indexing the title of the document and performing
> a search against the shorter field will automatically give it a higher
> weight than matches against the document content.  That is what I am trying
> to accomplish with a "NAME" field.  If someone enters a close match of the
> name of a document (example Names: "Color Me Mine" ,"Pittsburgh and Its
> Countryside"), I want that document to get a hit.  The search is user
> entered, so I want it to be case-insensitive.  I also don't want it to have
> to be an exact match.  Search terms such as "Pittsburgh Countryside" should
> match up against a name of "Pittsburgh and Its Countryside".
>
>
> Here I am adding the name field to my document:
> String value= "Color Me Mine";
> document.add(new Field("NAME", value, Field.Store.YES,
>                                 Field.Index.TOKENIZED));
>
> Performing a search:
> NAME:color me mine ->returns no results
> NAME:color -> returns the document
>
> I tried indexing the document without the value tokenized:
> document.add(new Field("NAME", value, Field.Store.YES,
>                                 Field.Index.UN_TOKENIZED));
>
> This caused the search to be case sensitive.
>
> I am about to modify my indexing/searching code to use a secondary field,
> "name_lowercase", this field would of course contain the name of the object
> in lowercase and I would lowercase my search terms in I construct my
> TermQuery for this field.
>
> Is this a valid approach, or am I missing something?
>
> Thanks!
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message