lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Sorting a Lucene index
Date Wed, 25 Aug 2010 08:31:55 GMT
1 billion i.e. 1,000,000,000?

Either buy more RAM, lots more RAM, or skip lucene sorting and do your
own sorting for the top n hits.  You might also want to look into
sharding/distributing your index.


--
Ian.


On Wed, Aug 25, 2010 at 6:16 AM, Shelly_Singh <Shelly_Singh@infosys.com> wrote:
> I have 1 bln documents to sort. So, that would mean ( 8 bln bytes == 8GB RAM) bytes.
> All I have is 8 GB on my machine, so I do not think approach would work.
>
> Any other options?
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Thursday, August 19, 2010 7:18 PM
> To: java-user@lucene.apache.org
> Subject: Re: Sorting a Lucene index
>
> You haven't yet told us how many documents you're talking about here, so
> it's
> hard to have a good idea of what solutions are. That said, I'd just try
> sorting first.
> The sorting cache size will be something like (sizeof(int or long)) *
> (number of documents).
> Measure (remember to measure the response after query warmups, the first
> few will be slower because they fill up the cache) THEN fix iff there's a
> problem.
>
> And just forget the idea of inserting your documents in the correct order
> <G>. You
> stated that the documents come in in random order. Document IDs are assigned
> internally to Lucene, and monotonically increasing. I sure don't see how you
> can
> reconcile those two things...
>
> But again, just try it with sorting on the numeric field and only fix things
> if you have a
> problem. Lots of work has been put into making Lucene fast, by very bright
> people. See
> if they've already solved your problem for you...
>
> Best
> Erick.
>
>
> On Thu, Aug 19, 2010 at 1:51 AM, Shelly_Singh <Shelly_Singh@infosys.com>wrote:
>
>> Hi Anshum,
>>
>> I require sorted results for all my queries and the field on which I need
>> sorting is fixed; so this lead to me the idea of storing in sorted order to
>> avoid sorting cost with every query.
>>
>> Thanks and Regards,
>>
>> Shelly Singh
>> Center For KNowledge Driven Information Systems, Infosys
>> Email: shelly_singh@infosys.com
>> Phone: (M) 91 992 369 7200, (VoIP)2022978622
>>
>> -----Original Message-----
>> From: Anshum [mailto:anshumg@gmail.com]
>> Sent: Wednesday, August 18, 2010 5:21 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Sorting a Lucene index
>>
>> Hi Shelly,
>> The search results so returned are sorted either by relevance, index order,
>> stored field, or custom order.
>> As you are saying that you would not be able to maintain the index order,
>>  you would have to do the sort at run time.
>> Sorting on a stored field is not costly and you may use it comfortably.
>> btw,
>> are you facing any issues in sort time or is it a presumption?
>>
>> --
>> Anshum Gupta
>> http://ai-cafe.blogspot.com
>>
>>
>> On Wed, Aug 18, 2010 at 5:12 PM, Shelly_Singh <Shelly_Singh@infosys.com
>> >wrote:
>>
>> > Hi,
>> >
>> > I have a Lucene index that contains a numeric field along with certain
>> > other fields. The order of incoming documents is random and
>> un-predictable.
>> > As a result, while creating an index, I end up adding docs in random
>> order
>> > with respect to the numeric field value.
>> >
>> > For example, documents may be added in following order:
>> > 12,y,d
>> > 100,o,p
>> > 1,x,y
>> > 23,u,i
>> > 31,v,m
>> > 22,b,m
>> > 109,k,l
>> >
>> > My requirement is that at search time, I want the documents in order of
>> the
>> > numeric field.
>> > One, option is to do a score/sort on the numeric field.
>> > But, this may be a costly operation.
>> >
>> > Hence, I am trying to find if there is some way, such that, my stored
>> index
>> > is sorted by itself.
>> >
>> > Please help.
>> >
>> > Thanks and Regards,
>> >
>> > Shelly Singh
>> > Center For KNowledge Driven Information Systems, Infosys
>> > Email: shelly_singh@infosys.com<mailto:shelly_singh@infosys.com>
>> > Phone: (M) 91 992 369 7200, (VoIP)2022978622
>> >
>> >
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message