lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Searching/sorting strategy for many properties for semantic web app
Date Wed, 22 Feb 2006 21:10:43 GMT
One very nice implementation to take a look at is the Simile project  
at MIT.   The Piggy Bank and Longwell projects use Lucene to index  
RDF and integrate full-text and structural queries nicely together.


On Feb 21, 2006, at 10:20 PM, David Pratt wrote:

> Hi there. I am new to Lucene and I have been developing a semantic  
> application for a while and it appears to me Lucene could help me  
> to get a much needed search with reasonable speed. I have some  
> general question to start:
> 1) Since my app is virtually all metadata, what should I store in  
> the indexes if anything?
> 2) Should I only index the most common properties that people will  
> search and combine the rest (and index this combined text as a field)?
> 3) I would like to sort and filter results but am concerned this  
> could be very memory intensive
> 4) Some general guidance on organizing indexes in an app would be  
> appreciated.
> My schema is fairly large but I generally expect people to search  
> on about 6 to 8 properties for the most part. I have the data  
> stored in an sql database but not in a conventional way. I am  
> willing to accept a slower advanced search on less common  
> properties (accomodating this with sql search) but I really want  
> some speed for the main properties with full text search.
> Pretty much everything in the app is metadata so I am most  
> interested in  focussing on the 6-8 properties that people will use  
> to search on for the most part. I am thinking of combining the text  
> of the remaining properties (quite a number) into a single  
> description type field so that essentially all information gets  
> indexed and ranked. Is this a reasonable approach?
> I see that there are advanced possibilities with the indexes to  
> sort and filter. How advisable is using sort for large record sets.  
> For example, say you have got 20000 records returned from your  
> search. Because this will have a web interface I will only be  
> showing first 20 likely so I will be batching results. Is the  
> sorting filtering highly memory intensive?
> Hopefully, someone can provide some initial advice. Many thanks.
> Regards,
> David
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message