lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From emmettwalsh <emmettwa...@gmail.com>
Subject Indexing and searching help
Date Tue, 03 Jul 2007 21:03:37 GMT

Hi,

lucene documentation seems to be very confusing... here is my predicament

I have an object like the following:

public class PropertyImpl implements Property {

	String id;

	List<String> names = new ArrayList<String>();
	
	String address = "";

	String city = "";

	String street = "";

	String state = "";

	String cachedMatchingString;

...

..
public String toString() {
		// save some time
		if (cachedMatchingString == null) {

			StringBuffer buf = new StringBuffer();

			for (String name : names) {
				//buf.append(name + ",");
				buf.append(name + " ");
			}

			if (address != null) {
				//buf.append(address + ",");
				buf.append(address + " ");
			}
			if (city != null) {
				//buf.append(city + ",");
				buf.append(city + " ");
			}
			if (street != null) {
				//buf.append(street + ",");
				buf.append(street + " ");
			}
			if (state != null) {
				//buf.append(state);
				buf.append(state);
			}

			// get rid of any spaces
			//cachedMatchingString = buf.toString().replace(" ", "");
			cachedMatchingString = buf.toString();		
		}
		return cachedMatchingString;
	}


A typical instance of this object would have the following data

id=123134
names= {'Emmett Walsh', 'John Walsh'}
address="Milltown."
city="Dublin"
street="Bankside Cottages"
state="Ireland"
cachedMatchingString="Emmett Walsh John Walsh Milltown Dublin Bankside
Cottages Ireland"



I have about 8000 of these objects which I store in a hashmap for direct
access based on there id. It is the id(s) that I need lucene to return to me
so that I can retireve the desired object(s) from the hashmap.

The cachedMatchingString string is what I use when indexing the information
like following:

e.g. so for each of the 8000 objects i do the following to build the index

idxWriter.addDocument(createDocument(prop));


private static Document createDocument(Property prop) {
		Document doc = new Document();

		Field field = new Field("id", prop.getId(), Store.YES, Index.NO);
		doc.add(field);

		Field field2 = new Field("content", prop.toString(), Store.NO,
				Index.TOKENIZED);  //will tokenise our long string
		doc.add(field2);

		return doc;
	}



Searching
I have a text field in my app that typically would take in a strings like

"Main s" , which
would end up in a query like "Main AND s*" like follows

"B", which would end up in a query like "b*"

                        queryString = queryString.trim();  //the string
taken in from textbox
			
			
			String[] strings = queryString.split(" ");
			int numStrings = strings.length;
			
			StringBuffer luceneQuery = new StringBuffer();
			
			for(int i=0;i<numStrings;i++){
				String aString = strings[i];
				if(aString.equals("")){
					continue;
				}
				luceneQuery.append(aString);
				if((i+1) != numStrings){
					luceneQuery.append(" AND ");
				}else{
					luceneQuery.append("*");
				}
			}
			
			//BooleanQuery.setMaxClauseCount(2*1024);
			
			
			QueryParser parser = new QueryParser("content",
					new StandardAnalyzer());
			
			
			org.apache.lucene.search.Query query =
parser.parse(luceneQuery.toString());

			// Search for the query
			Hits hits = searcher.search(query);


but i keep getting Toomanyclausesexceptions

Any suggestions on how to index or search better ?

-- 
View this message in context: http://www.nabble.com/Indexing-and-searching-help-tf4020937.html#a11420608
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message