lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan D <ryandet...@gmail.com>
Subject Re: Using lucene to search a bunch of keywords?
Date Wed, 23 Jul 2008 20:05:25 GMT
Heh, actually I'm using Perl but I've always associated text-search  
with Lucene, I'm not sure if it's the best solution or not. On the  
small side there are 1.6 million keywords, on the large side there are  
well over 100 million but I might find another way to break down the  
searches into smaller searches(send A-G server1, H-R to server2...etc).

Is there another search tool that might be better suited for  
this...the only thing I can relate this too is how AdWords works. A  
user enters a query in the Google search box and they search their  
database for people who've purchased those keywords to the appropriate  
ads.  What I'm doing is similar but without the payday. :-{

Currently I'm using a (huge) hash table and regular expressions  
($query =~ /$keyword/) going down the list from largest to smallest  
but I know this is not a long term solution especially if I have to  
load the large 100 million+ list in.

Thanks.


On Jul 23, 2008, at 3:54 PM, Steven A Rowe wrote:

> Hi Ryan,
>
> I'm not sure Lucene's the right tool for this job.
>
> I have used regular expressions and ternary search trees in the past  
> to do similar things.
>
> Is the set of keywords too large for an in-memory solution like  
> these?  If not, consider using a tool like the Perl package  
> Regex::PreSuf <http://search.cpan.org/dist/Regex-PreSuf/> - it can  
> convert a list of strings into a compact set of alternations, which  
> you can then import into a Java program.  (I'm not aware of any  
> similar Java tools.)
>
> Steve
>
> On 07/23/2008 at 3:30 PM, Ryan Detzel wrote:
>> Everything i've read and seen about luceen is search for keywords in
>> documents; I want to do the reverse. I have a huge list of
>> keywords("big boy","red ball","computer") and I have phrases that I
>> want to see if they keywords are in. For example using the small
>> keyword list above(store in documents in lucene) what's the best
>> approach to pass in a query "the girl likes red balls" and have it
>> match the keyword "red ball"?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message