lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Pratt <>
Subject Re: Searching/sorting strategy for many properties for semantic web app
Date Thu, 23 Feb 2006 02:01:38 GMT
Hi Erik. Many thanks for your reply. I'll likely see if I can find a 
list to pose a couple of questions there way. I am having fun with 
Lucene since it is new to me and I am impressed with the speed I am 
getting. I am reading anything I can get hold of and trying different 
code experiments. So far, the code is fairly straight forward so not so 
concerned about this at the moment.

I am really hoping to hear from experienced people like yourself more on 
strategically what to index, what sort of things it would be a good idea 
to store and what to do about a fairly large schema that has much 
metadata to offer. Also perhaps when sorting and filtering gets too 
expensive. I realize that just because the metadata is available doesn't 
necessarily mean you want to even put it all in an index. I think these 
issues are pretty general, however I know there are folks on this that 
would likely advise some particular path or direction because of their 
own experiences with Lucene. I would really like to hear from anyone 
that has been working with metadata particularly or anyone generally 
about these topics.


Erik Hatcher wrote:
> One very nice implementation to take a look at is the Simile project  at 
> MIT.   The Piggy Bank and Longwell projects use Lucene to index  RDF and 
> integrate full-text and structural queries nicely together.    
>     Erik
> On Feb 21, 2006, at 10:20 PM, David Pratt wrote:
>> Hi there. I am new to Lucene and I have been developing a semantic  
>> application for a while and it appears to me Lucene could help me  to 
>> get a much needed search with reasonable speed. I have some  general 
>> question to start:
>> 1) Since my app is virtually all metadata, what should I store in  the 
>> indexes if anything?
>> 2) Should I only index the most common properties that people will  
>> search and combine the rest (and index this combined text as a field)?
>> 3) I would like to sort and filter results but am concerned this  
>> could be very memory intensive
>> 4) Some general guidance on organizing indexes in an app would be  
>> appreciated.
>> My schema is fairly large but I generally expect people to search  on 
>> about 6 to 8 properties for the most part. I have the data  stored in 
>> an sql database but not in a conventional way. I am  willing to accept 
>> a slower advanced search on less common  properties (accomodating this 
>> with sql search) but I really want  some speed for the main properties 
>> with full text search.
>> Pretty much everything in the app is metadata so I am most  interested 
>> in  focussing on the 6-8 properties that people will use  to search on 
>> for the most part. I am thinking of combining the text  of the 
>> remaining properties (quite a number) into a single  description type 
>> field so that essentially all information gets  indexed and ranked. Is 
>> this a reasonable approach?
>> I see that there are advanced possibilities with the indexes to  sort 
>> and filter. How advisable is using sort for large record sets.  For 
>> example, say you have got 20000 records returned from your  search. 
>> Because this will have a web interface I will only be  showing first 
>> 20 likely so I will be batching results. Is the  sorting filtering 
>> highly memory intensive?
>> Hopefully, someone can provide some initial advice. Many thanks.
>> Regards,
>> David
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message