I can't use 4.0 because it's not released. our company require to use
stable version.
So I decide to wrapper an IndexSearcher with fields' values in memory like this:
and I copy all the codes of org.apache.lucene.search.SearcherManager.
replace IndexSearcher with
my IndexSearcherWithFields.
any suggestion for this solution?
public class IndexSearcherWithFields {
protected static Logger logger =
Logger.getLogger(IndexSearcherWithFields.class);
private Collection<String> inMemoryFields;
private Collection<String> inMemoryMultiValueFields;
private Map<String,Object[]> fieldsValues=new HashMap<String,Object[]>();
private IndexSearcher searcher;
public IndexReader getIndexReader(){
return searcher.getIndexReader();
}
public IndexSearcherWithFields(IndexSearcher
searcher,Collection<String> inMemoryFields
,Collection<String> inMemoryMultiValueFields) throws IOException{
this.searcher=searcher;
this.inMemoryFields=inMemoryFields;
this.inMemoryMultiValueFields=inMemoryMultiValueFields;
this.warmup();
}
public final IndexSearcher getSearcher(){
return searcher;
}
public Object[] getField(String fn){
return fieldsValues.get(fn);
}
private void warmup() throws IOException{
long start=System.currentTimeMillis();
IndexReader reader=searcher.getIndexReader();
int docSize=reader.maxDoc();
for(String f:inMemoryFields){
Object[] arr=new Object[docSize];
fieldsValues.put(f, arr);
}
for(String f:inMemoryMultiValueFields){
Object[] arr=new Object[docSize];
fieldsValues.put(f, arr);
}
for(int i=0;i<docSize;i++){
Document doc=reader.document(i);
for(String f:inMemoryFields){
Object[] arr=fieldsValues.get(f);
arr[i]=doc.get(f);
}
for(String f:inMemoryMultiValueFields){
Object[] arr=fieldsValues.get(f);
arr[i]=doc.getValues(f);
}
}
logger.debug("warm up fields time:
"+(System.currentTimeMillis()-start)+" ms.");
}
}
On Wed, Jun 20, 2012 at 11:37 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Right, the field must have a single token for FieldCache.
>
> But if you are on 4.x you can use DocTermOrds
> (FieldCache.getDocTermOrds) which allows for multiple tokens per
> field.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Jun 20, 2012 at 9:47 AM, Li Li <fancyerii@gmail.com> wrote:
>> but as l can remember, in 2.9.x FieldCache can only apply to indexed but
>> not analyzed fields.
>> 在 2012-6-20 晚上8:59,"Danil ŢORIN" <torindan@gmail.com>写道:
>>
>>> I think you are looking for FieldCache.
>>>
>>> I'm not sure the current status in 4x, but it worked in 2.9/3.x.
>>> Basically it's an array, so access is quite straight forward, and the
>>> best part IndexReader manage those for you, so on reopen only new
>>> segments are read.
>>>
>>> Small catch is that FiledCaches are per segment, so you need to be
>>> careful if you want to retrieve data using global document ids.
>>> However if you are building result set in your own Collector, using
>>> FieldCache is quite straight forward.
>>>
>>>
>>> On Wed, Jun 20, 2012 at 3:49 PM, Li Li <fancyerii@gmail.com> wrote:
>>> > hi all
>>> > I need to return certain fields of all matched documents quickly.
>>> > I am now using Document.get(field), but the performance is not well
>>> > enough. Originally I use HashMap to store these fields. it's much
>>> > faster but I have to maintain two storage systems. Now I am
>>> > reconstructing this project. I want to store everything in lucene.
>>> > when I use an IndexSearcher to perform searching, I can get
>>> > related fields by docID. it must thread safe. And like the IndexReader
>>> > it's a snapshot of the index
>>> > Here are some solutions I can come up with:
>>> > 1. StringIndex
>>> > I have considered StringIndex but some fields need to tokenize.
>>> > maybe I can use two fields, one is tokenized for searching. Another is
>>> > indexed but not analyzed, the later one is only used for StringIndex.
>>> > If there is any better solution, maybe I have to use this one.
>>> > 2. Associating a Map with each IndexReader
>>> > when the IndexReader is opened or reopened, I need to iterate
>>> > through each documents of this Reader and put everything into a map.
>>> > The problem is it's slower and I don't know whether it's problematic
>>> > with NRT.
>>> >
>>> > is there any other better solution? thanks.
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|