clerezza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <>
Subject Re: Search in rdf.cris
Date Thu, 03 Oct 2013 11:57:48 GMT
Hi Stephane,

I don't have much time now but I just wanted to let you know that IMHO your
list of goals / tasks sounds completely reasonable, in case you need it I
may be able to give some help along the next weeks.


2013/10/2 Stephane Gamard <>

> Hi Team,
> My name's Stephane and I am currently participating to the Fusepool FP7
> project. Within this project we are using stanbol & clerezza as key
> architectural components. Coming from a pure FullText search and
> Information Retrieval background I find myself in somewhat of a new
> territory.
> But within that new territory there is a link to my area of expertise:
> Lucene/Solr in the rdf.cris package. This package turns out to be crucial
> for our project and I would gladly participate and contribute my knowledge
> as a Lucene and Solr developer. So here in a nutshell a list of "small
> contributions" to start with:
> - Abstraction Refactoring
> Currently CRIS is using Lucene as its FT engine, but we might want to
> eventually go to Solr (or elasticsearch for XYZ reasons). First step would
> be to remove all Lucene dependencies in rdf.cris package and push
> implementation in rdf.cris.lucene package
> - Lucene 4.x Branch
> There are a large number of changes since the 2.x and 3.x branch of
> Lucene. I'd propose a small refactor and overhaul of the rdf.cris.lucene
> package to take advantage of Lucene's new features (Facets, SearchManager,
> …)
> - Solr Implementation
> In line with "in production" I strongly believe clerezza's CRIS component
> should be able to leverage established services without having to manage
> their scalability. That goes for FullText Search most obviously. The idea
> is to be able to use a remote Solr Server (Solr since it comes with the
> whole pseudo-rest servicing on top of Lucene).
> - Fine Grained Search
> As a logical evolution from the points above, it would be then perfect if
> clerezza's fulltext search capabilities could benefit from all the features
> of Lucene/Solr. I am especially thinking about:
> -- Field/Analyzer specialisation (we don't compare authors, dates and text
> in the same way in Lucene/Solr)
> -- Boosting (For IR, the title of a document usually yields more important
> information than its footnotes)
> -- Advanced facets (things like date-rage facets, pivot facets (called 2nd
> level facets in fusepool))
> -- Geolocalised searches (big thing in Lucene/Solr 4.x branch… would
> eventually be a nice to have)
> I will execute this work over the next few weeks/months as part of the
> fusepool project, but most of all I would be pleased and interested to
> finally get a top-notch implementation of cross rdf-text solution. Very
> much looking forward for your feedback and hopefully support ;)
> PS: who ever initiated the GraphIndexer implementation did an excellent
> job! Will hopefully follow in his footsteps!
> Cheers,
> _Stephane

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message