lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Newton <>
Subject Re: Moving From Oracle Text Search To Solr
Date Tue, 16 Mar 2010 20:32:51 GMT
I've also index a concatenation of 50k journal articles (making a
single document of several hundred MB of text) and it did not give me
an OOM.


On 16 March 2010 15:57, Erick Erickson <> wrote:
> Why do you think you'd hit OOM errors? How big is "very large"? I've
> indexed, as a single document, a 26 volume encyclopedia of civil war
> records......
> Although as much as I like the technology, if I could get away without using
> two technologies, I would. Are you completely sure you can't get what you
> want with clever Oracle querying?
> Best
> Erick
> On Tue, Mar 16, 2010 at 3:20 PM, Neil Chaudhuri <
>> wrote:
>> I am working on an application that currently hits a database containing
>> millions of very large documents. I use Oracle Text Search at the moment,
>> and things work fine. However, there is a request for faceting capability,
>> and Solr seems like a technology I should look at. Suffice to say I am new
>> to Solr, but at the moment I see two approaches-each with drawbacks:
>> 1)      Have Solr index document metadata (id, subject, date). Then Use
>> Oracle Text to do a content search based on criteria. Finally, query the
>> Solr index for all documents whose id's match the set of id's returned by
>> Oracle Text. That strikes me as an unmanageable Boolean query.  (e.g.
>> id:4ORid:33432323OR...).
>> 2)      Remove Oracle Text from the equation and use Solr to query document
>> content based on search criteria. The indexing process though will almost
>> certainly encounter an OutOfMemoryError given the number and size of
>> documents.
>> I am using the embedded server and Solr Java APIs to do the indexing and
>> querying.
>> I would welcome your thoughts on the best way to approach this situation.
>> Please let me know if I should provide additional information.
>> Thanks.



View raw message