Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 748 invoked from network); 6 Jun 2005 07:00:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Jun 2005 07:00:35 -0000 Received: (qmail 30301 invoked by uid 500); 6 Jun 2005 07:00:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30249 invoked by uid 500); 6 Jun 2005 07:00:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30186 invoked by uid 99); 6 Jun 2005 07:00:24 -0000 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=HTML_MESSAGE,MIME_BOUND_NEXTPART,SUBJ_ALL_CAPS,UPPERCASE_50_75 X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from daakghar.controlnet.co.in (HELO daakghar.controlnet.co.in) (203.199.26.74) by apache.org (qpsmtpd/0.28) with SMTP; Mon, 06 Jun 2005 00:00:21 -0700 Received: from kartik1 ([192.168.4.1]) by dakiya.controlnet.co.in (Netscape Messaging Server 4.15) with ESMTP id IHNHMC00.PC2 for ; Mon, 6 Jun 2005 12:19:24 +0530 From: "Karthik N S" To: "LUCENE" Subject: RE REQUEST: SPECIFIC HIT Date: Mon, 6 Jun 2005 12:10:57 +0530 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPartTM-000-068dcfb4-fdae-4d07-911c-12aba9609fc5" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_NextPartTM-000-068dcfb4-fdae-4d07-911c-12aba9609fc5 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0000_01C56A90.CBFD7EF0" ------=_NextPart_000_0000_01C56A90.CBFD7EF0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Hi Guys. Apologies..... with refrence to my last main dted Mon, 14 Mar 2005 http://mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBE LINLGKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E I would like to again request some Help in the search Concepts. I have Indexed documents sucessfully and they would be Document 1 contains = ELECTRONICS DIGITAL CAMERA Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIES Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL ACCESSORIES Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES On Search "Digital Camera Optics" , the hit has to return me 3rd Document ONLY instead of other Documents [ The word DIGITAL CAMERA is common word in all cases and could be in any order of sequence]. To Solve this Problem I creating a new Field called 'IGNORE WORD' and this field would be as shown below Document 1 contains = ELECTRONICS DIGITAL CAMERA 'IGNORE WORD = BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL Document 2 contains = ELECTRONICS DIGITAL CAMERA BATTERY ACCESSORIES 'IGNORE WORD = ACCESSORIES,0PTICS,CABEL,APPERAL Document 3 contains = ELECTRONICS DIGITAL CAMERA 0PTICS 'IGNORE WORD = BATTERY,ACCESSORIES,CABEL,APPERAL Document 4 contains = ELECTRONICS DIGITAL CAMERA ACCESSORIES 'IGNORE WORD = BATTERY,0PTICS,CABEL,APPERAL Document 5 contains = ELECTRONICS DIGITAL CAMERA CABEL ACCESSORIE 'IGNORE WORD = BATTERY,0PTICS,APPERAL Document 6 contains = ELECTRONICS DIGITAL CAMERA OPTICS CABEL ACCESSORIES 'IGNORE WORD = BATTERY,APPERAL Document 7 contains = ELECTRONICS DIGITAL CAMERA APPERAL ACCESSORIES 'IGNORE WORD = BATTERY,0PTICS,CABEL For Every search I feed the 'IGNORE WORD' to the query such as Search = DIGITAL CAMERA 0PTICS Query = +KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY ACCESSORIES CABEL APPERAL) The resultant hit would be the 3rd doc instead of 3rd and 5th.. The Problem here is of 2 conditions 1) Search could be DIGITAL CAMERA 0PTICS or OPTICS CAMERAS DIIGTAL or CAMERA OPTICS should retrieve same hit results. 2) The process of creation of 'IGNORE WORD' list is very time consuming...[ Document is in very large numbers ] and also permutation /combination for the same is very expensive. Does anybody in here have some idea on how to process. Thx in advance Karthik ------=_NextPart_000_0000_01C56A90.CBFD7EF0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

Hi

Guys.

Apologies.....

with = refrence to my=20 last main dted  Mon, 14 Mar = 2005=20

http://= mail-archives.apache.org/mod_mbox/lucene-java-user/200503.mbox/%3COBELINL= GKPEMCIEIJJNKEEIACCAA.karthik@controlnet.co.in%3E =

I = would like to=20 again request some Help in the search Concepts. 

I =
have Indexed documents sucessfully and they would be=20
Document 1 contains   =3D   ELECTRONICS  DIGITAL CAMERA
Document 2 contains   =3D   ELECTRONICS  DIGITAL CAMERA BATTERY =
ACCESSORIES
Document 3 contains   =3D   ELECTRONICS  DIGITAL CAMERA 0PTICS
Document 4 contains   =3D   ELECTRONICS  DIGITAL CAMERA ACCESSORIES
Document 5 contains   =3D   ELECTRONICS  DIGITAL CAMERA CABEL =
ACCESSORIES
Document 6 contains   =3D   ELECTRONICS  DIGITAL CAMERA OPTICS CABEL =
ACCESSORIES
Document 7 contains   =3D   ELECTRONICS  DIGITAL CAMERA APPERAL =
ACCESSORIES
On Search  =
"Digital Camera Optics" , the hit has to return me 3rd Document ONLY
instead of other Documents  [ The word DIGITAL CAMERA is common word in =
all cases and could be in any order of =
sequence].
 
To Solve =
this Problem I creating a new Field called 'IGNORE WORD' and this field =
would be as shown below
 
Document =
1 contains   =3D   ELECTRONICS  DIGITAL CAMERA                           =
          
'IGNORE WORD =
=3D BATTERY,ACCESSORIES,0PTICS,CABEL,APPERAL
 
Document 2 contains   =3D   ELECTRONICS  DIGITAL CAMERA BATTERY =
ACCESSORIES     
'IGNORE =
WORD =3D ACCESSORIES,0PTICS,CABEL,APPERAL
 
Document 3 contains   =3D   ELECTRONICS  DIGITAL CAMERA =
0PTICS
'IGNORE WORD =3D =
BATTERY,ACCESSORIES,CABEL,APPERAL
 
Document =
4 contains   =3D   ELECTRONICS  DIGITAL CAMERA ACCESSORIES
'IGNORE WORD =3D BATTERY,0PTICS,CABEL,APPERAL
 
Document 5 contains   =3D   ELECTRONICS  DIGITAL CAMERA CABEL =
ACCESSORIE
'IGNORE WORD =
=3D BATTERY,0PTICS,APPERAL
 
Document 6 =
contains   =3D   ELECTRONICS  DIGITAL CAMERA OPTICS CABEL =
ACCESSORIES
'IGNORE WORD =
=3D BATTERY,APPERAL
 
Document 7 =
contains   =3D   ELECTRONICS  DIGITAL CAMERA APPERAL =
ACCESSORIES
'IGNORE WORD =
=3D BATTERY,0PTICS,CABEL
 
 
For Every =
search I feed the 'IGNORE WORD' to the query such =
as
 
 
Search  =3D DIGITAL CAMERA 0PTICS =
Query   =3D =
+KEYSRC:Digital +KEYSRC:Camera +KEYSRC:Cabel -KEYSRC:(BATTERY =
ACCESSORIES CABEL APPERAL)
 
The resultant hit would be the 3rd doc =
instead of 3rd and 5th..
 
 
The Problem here is of 2 =
conditions
 
1) Search could be  DIGITAL CAMERA 0PTICS  or =
OPTICS CAMERAS DIIGTAL  or CAMERA OPTICS should retrieve same hit =
results.
 
2) The process of creation of  'IGNORE WORD' =
list is very time consuming...[ Document is in very large numbers =
]
    and also =
permutation /combination for the same is very =
expensive.
 
<=
SPAN class=3D437191606-06062005> Does =
anybody in here have some idea on how to =
process.
 
<=
SPAN class=3D437191606-06062005> 
<=
SPAN class=3D437191606-06062005>Thx in =
advance
Karthik
------=_NextPart_000_0000_01C56A90.CBFD7EF0-- ------=_NextPartTM-000-068dcfb4-fdae-4d07-911c-12aba9609fc5 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org ------=_NextPartTM-000-068dcfb4-fdae-4d07-911c-12aba9609fc5--