lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fr.jur...@voila.fr
Subject Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.
Date Tue, 22 Mar 2011 15:15:53 GMT
Hi Luceners,

this is my 1st experience with ARQ, LARQ & Lucene;
everyth. went smooth so far, however the slope seems to be getting 
steeper suddenly.

The initial problem was to develop a Java app to build, then to browse 
through, a repository of RDF data. With TDB/ARQ, this is now running 
smoothly.
Then came 2 decisions:
1) localize most of the RDF litterals, & have the repository hold them all; so far 
20 Western languages + 3 Eastern, & counting;
2) allow queries to include full-text searches over the litterals.
The problem now is that I expect new languages to be added w/out prior 
warning, & I'll have to cope with most of them decently, right from 
day-zero. As with most of the 20 already identified, for that matter.

So far, Lucene has exactly the right architecture for a Java indexing 
component; LARQ seems to have more or less what it takes to both index & 
query RDF based on language info.
The only thing I need is the middle layer: a Java component extending 
Lucene, that'd pull a plausible Analyzer out of its magic hat, for every 
ISO 639-1 language tag however unlikely that turns up in the RDF input.
Not just an analyzer, mark: a *plausible* one. I mean one that'll 
generate usable indexes right out of the box in most cases; I cannot 
afford to bring the system back into dev & study the arcanes of 
automatic indexing in just every language we're working with. So 
lucene-contrib + covering the rest with StandardAnalyzer/English 
stopwords is not an option.
On the other hand, to mitigate the "however unlikely" part: the 
localized litterals are repair instructions, so I don't reckon we miss 
support for Middle-High Sumerian any time soon.

I had a quick glance at Solr; it would appear to be a fair candidate 
since it must needs have such a component nested in its server-side 
software, & it's free.
Only I would rather have the component in question offered as a separate 
release, or at least with backward compatibility guaranteed.
What do you advise; should I apple-C apple-V part of Solr? Else, what alternate freeware might
do?  

Best regards,
     François Jurain.



____________________________________________________

  Retrouvez les 10 conseils pour économiser votre carburant sur Voila :  http://actu.voila.fr/evenementiel/LeDossierEcologie/l-eco-conduite/




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message