lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik N S" <kart...@controlnet.co.in>
Subject RE REQUEST: SPECIFIC HIT
Date Mon, 06 Jun 2005 06:40:57 GMT
Hi

Guys.

Apologies.....

with refrence to my last main dted  Mon, 14 Mar 2005

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBE
LINLGKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E

I would like to again request some Help in the search Concepts.

I have Indexed documents sucessfully and they would be

Document 1 contains   =   ELECTRONICS  DIGITAL CAMERA
Document 2 contains   =   ELECTRONICS  DIGITAL CAMERA BATTERY ACCESSORIES
Document 3 contains   =   ELECTRONICS  DIGITAL CAMERA 0PTICS
Document 4 contains   =   ELECTRONICS  DIGITAL CAMERA ACCESSORIES
Document 5 contains   =   ELECTRONICS  DIGITAL CAMERA CABEL ACCESSORIES
Document 6 contains   =   ELECTRONICS  DIGITAL CAMERA OPTICS CABEL
ACCESSORIES
Document 7 contains   =   ELECTRONICS  DIGITAL CAMERA APPERAL ACCESSORIES

On Search  "Digital Camera Optics" , the hit has to return me 3rd Document
ONLY
instead of other Documents  [ The word DIGITAL CAMERA is common word in all
cases and could be in any order of sequence].

To Solve this Problem I creating a new Field called 'IGNORE WORD' and this
field would be as shown below

Document 1 contains   =   ELECTRONICS  DIGITAL CAMERA
'IGNORE WORD = BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL

Document 2 contains   =   ELECTRONICS  DIGITAL CAMERA BATTERY ACCESSORIES
'IGNORE WORD = ACCESSORIES,0PTICS,CABEL,APPERAL

Document 3 contains   =   ELECTRONICS  DIGITAL CAMERA 0PTICS
'IGNORE WORD = BATTERY,ACCESSORIES,CABEL,APPERAL

Document 4 contains   =   ELECTRONICS  DIGITAL CAMERA ACCESSORIES
'IGNORE WORD = BATTERY,0PTICS,CABEL,APPERAL

Document 5 contains   =   ELECTRONICS  DIGITAL CAMERA CABEL ACCESSORIE
'IGNORE WORD = BATTERY,0PTICS,APPERAL

Document 6 contains   =   ELECTRONICS  DIGITAL CAMERA OPTICS CABEL
ACCESSORIES
'IGNORE WORD = BATTERY,APPERAL

Document 7 contains   =   ELECTRONICS  DIGITAL CAMERA APPERAL ACCESSORIES
'IGNORE WORD = BATTERY,0PTICS,CABEL


For Every search I feed the 'IGNORE WORD' to the query such as


Search  = DIGITAL CAMERA 0PTICS
Query   = +KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY
ACCESSORIES CABEL APPERAL)

The resultant hit would be the 3rd doc instead of 3rd and 5th..


The Problem here is of 2 conditions

1) Search could be  DIGITAL CAMERA 0PTICS  or OPTICS CAMERAS DIIGTAL  or
CAMERA OPTICS should retrieve same hit results.

2) The process of creation of  'IGNORE WORD' list is very time
consuming...[ Document is in very large numbers ]
    and also permutation /combination for the same is very expensive.

 Does anybody in here have some idea on how to process.


Thx in advance
Karthik


Mime
View raw message