lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: DIH nested entities don't work
Date Mon, 29 Oct 2012 14:41:13 GMT
If your subentities are large, the default DIH Cache probably isn't going to work because it
stores all the data in-memory.   (This is CachedSQLEntityProcessor for Solr 3.5 or earlier
; cacheImpl="SortedMapBackedCache" for 3.6 or later)

DIH for Solr 3.6 and later supports pluggable caches (see https://issues.apache.org/jira/browse/SOLR-2382),
so you have the option of caching to disk.  Unfortunately the only good disk-backed cache
available here uses Berkley Database, which has an incompatible license and cannot be included
with an Apache project.  See https://issues.apache.org/jira/browse/SOLR-2613 for the code
; you'll have to download bdb-je from Oracle yourself.  We also converted from Endeca, and
needed these cache options to replace the Forge Cache feature which we depended on heavily
for joins.  It was a lot of work to set this up with DIH and getting everything to work correctly
but the end result for us is actually a lot faster (and way more flexible) than Forge ever
was.

By the way, there have been sporatic reports of unexpected behavior using Caching with 3.6.
 You may want to try 4.0 if you're currently running 3.6.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: mroosendaal [mailto:mroosendaal@yahoo.com] 
Sent: Monday, October 29, 2012 5:06 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH nested entities don't work

Hi,

It seems to work without the cache option, the downside is it will takes
ages for everything to be indexed and my testset is 20 times smaller than
the productset.

Indexing just the root item takes 3 minutes (>600K) but every subentity
takes more time which is obvious but i would've hoped it would at least be
faster.

Our current searchengine (Endeca) does the same thing but takes 'only'
1h20m.

How can i speed this up, the bottleneck is not the CPU or memory, but simply
the databasetime.

Thanks,
Maarten



--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-nested-entities-don-t-work-tp4015514p4016618.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message