lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject RE: any good idea for loading fields into memory?
Date Sat, 23 Jun 2012 00:29:33 GMT
30 ms is important.
for one reason, I am reconstructing a project and integrating everything
into lucene. so it should be as fast as before.
the second reason is the matching of lucene is just a small part, there are
many other steps. also, 40ms is the average time, for some query which
matching 10k documents, it needs hundreds of ms.
the reason of slowing down is getting fields value, when I use my wrapped
indexsearcher which loading fields into arrays, it's as far as before.
why I use lucene instead of just a hashmap? first, I do not want to
implement an in memory invert index myself. second, hashmap should be saved
to disk for persistency. third, modifying hashmap with transaction is
difficult, when nodifying, searching threads should see all things or
nothing. fourth, dealing with boolean query is convenient in lucene.
在 2012-6-23 凌晨1:16,"Paul Hill" <paul@metajure.com>写道:

>
> 10 ms vs 40 ms. I'd say so what?
> Is your overall time noticeably effected by this 30 ms gain?  Does the end
> user notice this 30 ms gain?
> Where is the time going?  Just getting the hits?  Getting all documents?
>  Building result set as your app uses it?
>
> If it is the hits, have you considered searching on a hash value instead
> of the value of the field?
> If it is getting the documents, are you getting too much but only using a
> little in this particular case?
> If it is building the result set, because of need to re-parse, I would
> look into trying a 2nd multi-valued field with exactly (or closer to) what
> you need in it.
>
> -Paul
>
> > -----Original Message-----
> > From: Li Li [mailto:fancyerii@gmail.com]
> > our old map implementation use about 10 ms, while newer one is 40
> > ms. the reason is we need to return some fields of all hitted documents.
> the fields are not very long strings
> > and the document number is less than 100k
>
>
> > 在 2012-6-22 下午5:13,"Danil ŢORIN" <torindan@gmail.com>写道:
> >
> > > If you can afford it, you could add one additional untokenized stored
> > > field that will contain the serialized(one way or another) form of the
> > > document.
> > >
> > > Add FieldCache on top of it, and return it right away.
> > >
> > > But we are getting into the area where you basically have to keep all
> > > your documents in memory.
> > >
> > > In this situation, maybe it simply doesn't make sense to over
> > > complicate things: just keep your index in memory (as it is right now,
> > > no additional fields or field caches), and retrieving document would
> > > be fast enough simply because all data is in RAM.
> > >
> > >
> > > On Fri, Jun 22, 2012 at 3:56 AM, Li Li <fancyerii@gmail.com> wrote:
> > > > use collector and field cache is a good idea for ranking by certain
> > > > field's value.
> > > > but I just need to return matched documents' fields. and also field
> > > > cache can't store multi-value fields?
> > > > I have to store special chars like '\n' to separate them and split
> > > > string to string array in runtime.
> > > >
> > > > On Fri, Jun 22, 2012 at 5:11 AM, Paul Hill <paul@metajure.com>
> wrote:
> > > >> I would ask the question that if you want to look at the whole
> > > >> value of
> > > a field during searching, why don't you have a just such a field in
> > > your index?
> > > >> I have an index with several fields that have 2 versions of the
> > > >> field
> > > both analyzed and unanalyzed.  It works great for me in 3.x (not 4.x).
> > > >> Have you read about Collectors?  That is where I find myself
> > > >> working
> > > with field caches, but maybe this is not your need. I also properly
> > > configured the call to search.doc( docId ) with the second argument,
> > > >> so I only automatically load the fields I will be using in my
> > > >> returned
> > > results, not any 'extra' fields use in Filters, Collectors etc.  If
> > > you have a special query that needs to be extra fast, you can change
> > > the fields to load just in the special code for that special query.
> > > >>
> > > >> I hope that helps,
> > > >>
> > > >> -Paul
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Li Li [mailto:fancyerii@gmail.com] but as l can remember,
in
> > > >>> 2.9.x FieldCache can only apply to indexed
> > > but not analyzed fields.
> > > >>
> > > >
> > > > --------------------------------------------------------------------
> > > > - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message