Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 52129 invoked from network); 25 Nov 2010 16:48:35 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Nov 2010 16:48:35 -0000 Received: (qmail 52568 invoked by uid 500); 25 Nov 2010 16:48:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 52377 invoked by uid 500); 25 Nov 2010 16:48:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 52368 invoked by uid 99); 25 Nov 2010 16:48:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Nov 2010 16:48:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [193.205.191.1] (HELO gauss.crmpa.unisa.it) (193.205.191.1) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 25 Nov 2010 16:48:24 +0000 Received: from nelson (unknown [193.205.191.127]) by gauss.crmpa.unisa.it (Postfix) with ESMTP id 89847140CF for ; Thu, 25 Nov 2010 17:46:30 +0100 (CET) From: "Claudia Grieco" To: References: <000a01cb8c96$a6ed6570$f4c83050$@unisa.it> <000001cb8ca4$728a1bd0$579e5370$@unisa.it> In-Reply-To: Subject: R: Retrieve found keywords from document Date: Thu, 25 Nov 2010 17:48:03 +0100 Message-ID: <000e01cb8cc0$869e1e70$93da5b50$@unisa.it> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcuMp8hpbglnx+7USdacQF7GtPhAowAEAKEA Content-Language: it X-Virus-Checked: Checked by ClamAV on apache.org Thanks a lot. I used the lucene analyzer to parse the profile and everything works :) -----Messaggio originale----- Da: Ian Lea [mailto:ian.lea@gmail.com]=20 Inviato: gioved=EC 25 novembre 2010 14.52 A: java-user@lucene.apache.org Oggetto: Re: Retrieve found keywords from document You could parse the output from the lucene analyzer that you are using to get hold of a list of terms and pick the ones that are hobbies. Or do it outside lucene using whatever string parsing technique you like. Or take a look at the recent thread on this list on a similar topic: "High frequency term for the searched query". -- Ian. On Thu, Nov 25, 2010 at 1:27 PM, Claudia Grieco wrote: > What I call "profile" is free text (extracted from a pdf) and not the result > of the user listing hobbies in a form > So to store hobbies in a field called "hobbies" I have to extract = hobbies > from text first...is it possible to do it using Lucene? > > -----Messaggio originale----- > Da: Ian Lea [mailto:ian.lea@gmail.com] > Inviato: gioved=EC 25 novembre 2010 13.01 > A: java-user@lucene.apache.org > Oggetto: Re: Retrieve found keywords from document > > Can't you just store the hobbies as standard stored fields > (Field.Store.YES), or as a single field, call doc.get("hobbies") and > do what you want with them? > > This sounds rather like faceting - if so you might want to consider > using Solr. =A0http://wiki.apache.org/solr/SolrFacetingOverview > > > -- > Ian. > > On Thu, Nov 25, 2010 at 11:48 AM, Claudia Grieco = > wrote: >> Hi guys, >> >> I have this problem: >> >> I'm using Lucene to create a search engine on people profiles. >> >> I have a set of hobbies (let's say {"reading" , "singing"} for = example) > =A0and >> I want to find people who have at least one of these hobbies AND = which of >> these hobbies they have. >> >> Currently I search for each one of these hobbies (ex, one search for >> reading, one search for singing) but since the list of hobbies is = very > long >> (200+) I'd like to do the following: >> >> >> >> 1)Do ONE search that finds all the documents who have at least an = hobby in >> the text ( this is easily accomplished using BooleanQuery) >> >> 2)For each document, retrieve the keywords found. >> >> >> >> Do you have any ideas on how to do n# 2? >> >> Thank you >> >> Claudia >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org