lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: any good idea for loading fields into memory?
Date Thu, 21 Jun 2012 07:47:38 GMT
I can't use 4.0 because it's not released. our company require to use
stable version.

So I decide to wrapper an IndexSearcher with fields' values in memory like this:
and I copy all the codes of org.apache.lucene.search.SearcherManager.
replace IndexSearcher with
my IndexSearcherWithFields.

any suggestion for this solution?

public class IndexSearcherWithFields {
	protected static Logger logger =
Logger.getLogger(IndexSearcherWithFields.class);
	private Collection<String> inMemoryFields;
	private Collection<String> inMemoryMultiValueFields;
	private Map<String,Object[]> fieldsValues=new HashMap<String,Object[]>();
	private IndexSearcher searcher;
	
	public IndexReader getIndexReader(){
		return searcher.getIndexReader();
	}
	
	public IndexSearcherWithFields(IndexSearcher
searcher,Collection<String> inMemoryFields
			,Collection<String> inMemoryMultiValueFields) throws IOException{
		this.searcher=searcher;
		this.inMemoryFields=inMemoryFields;
		this.inMemoryMultiValueFields=inMemoryMultiValueFields;
		this.warmup();
	}
	
	public final IndexSearcher getSearcher(){
		return searcher;
	}
	
	public Object[] getField(String fn){		
		return fieldsValues.get(fn);
	}
	
	private void warmup() throws IOException{
		long start=System.currentTimeMillis();
		IndexReader reader=searcher.getIndexReader();
		int docSize=reader.maxDoc();
		for(String f:inMemoryFields){
			Object[] arr=new Object[docSize];
			fieldsValues.put(f, arr);
		}
		for(String f:inMemoryMultiValueFields){
			Object[] arr=new Object[docSize];
			fieldsValues.put(f, arr);
		}
		
		for(int i=0;i<docSize;i++){
			Document doc=reader.document(i);
			for(String f:inMemoryFields){
				Object[] arr=fieldsValues.get(f);
				arr[i]=doc.get(f);
			}
			
			for(String f:inMemoryMultiValueFields){
				Object[] arr=fieldsValues.get(f);
				arr[i]=doc.getValues(f);
			}
		}
		logger.debug("warm up fields time:
"+(System.currentTimeMillis()-start)+" ms.");
	}
}


On Wed, Jun 20, 2012 at 11:37 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Right, the field must have a single token for FieldCache.
>
> But if you are on 4.x you can use DocTermOrds
> (FieldCache.getDocTermOrds) which allows for multiple tokens per
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 20, 2012 at 9:47 AM, Li Li <fancyerii@gmail.com> wrote:
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but
>> not analyzed fields.
>> 在 2012-6-20 晚上8:59,"Danil ŢORIN" <torindan@gmail.com>写道:
>>
>>> I think you are looking for FieldCache.
>>>
>>> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
>>> Basically it's an array, so access is quite straight forward, and the
>>> best part IndexReader manage those for you, so on reopen only new
>>> segments are read.
>>>
>>> Small catch is that FiledCaches are per segment, so you need to be
>>> careful if you want to retrieve data using global document ids.
>>> However if you are building result set in your own Collector, using
>>> FieldCache is quite straight forward.
>>>
>>>
>>> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fancyerii@gmail.com> wrote:
>>> > hi all
>>> >    I need to return certain fields of all matched documents quickly.
>>> > I am now using Document.get(field), but the performance is not well
>>> > enough. Originally I use HashMap to store these fields. it's much
>>> > faster but I have to maintain two storage systems. Now I am
>>> > reconstructing this project. I want to store everything in lucene.
>>> >    when I use an IndexSearcher to perform searching, I can get
>>> > related fields by docID. it must thread safe. And like the IndexReader
>>> > it's a snapshot of the index
>>> >    Here are some solutions I can come up with:
>>> >    1. StringIndex
>>> >       I have considered StringIndex but some fields need to tokenize.
>>> > maybe I can use two fields, one is tokenized for searching. Another is
>>> > indexed but not analyzed, the later one is only used for StringIndex.
>>> > If there is any better solution, maybe I have to use this one.
>>> >    2. Associating a Map with each IndexReader
>>> >       when the IndexReader is opened or reopened, I need to iterate
>>> > through each documents of this Reader and put everything into a map.
>>> > The problem is it's slower and I don't know whether it's problematic
>>> > with NRT.
>>> >
>>> >    is there any other better solution? thanks.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message