lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: PreAnalyzed URP and SchemaRequest API
Date Fri, 13 Apr 2018 21:01:03 GMT
Hello David,

If JSON serialization is too bulky, we could also opt for SimplePreAnalyzed right? At least
as a FieldType it is possible, if not with URP, it just needs some work.

Regarding results; we haven't done it yet, and won't for some time, but we will when we reintroduce
OpenNLP in the analysis chain. We tried to introduce POS-tagging on our own two years ago,
but i wasn't suited for production because it was too heavy on the CPU. Indexing data suddenly
took eight to ten times longer in a SolrCloud environment with three replica's.

If we offload our current chains without OpenNLP, it will only benefit when large fields pass
through a regex, and for decompounding the Germanic languages we ingest. Offloading just this
cost is a micro optimization, offloading the various OpenNLP char and token filters are really
beneficial.

Regarding a dependency on Lucene core and analysis-common, it would be helpful, but we'll
manage.

Thanks again,
Markus
 
-----Original message-----
> From:David Smiley <david.w.smiley@gmail.com>
> Sent: Thursday 12th April 2018 19:16
> To: solr-user@lucene.apache.org
> Subject: Re: PreAnalyzed URP and SchemaRequest API
> 
> Ah ok.
> I've wondered how much value there is in pre-analysis.  The serialization
> of the analyzed form in JSON is bulky.  If you can share any results, I'd
> be interested to hear how it went.  It's an optimization so you should be
> able to know how much better it is.  Of course it isn't for everybody --
> only when the analysis chain is sufficiently complex.
> 
> On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jelsma@openindex.io>
> wrote:
> 
> > Hello David,
> >
> > The remote client has everything on the class path but just calling
> > setTokenStream is not going to work. Remotely, all i get from SchemaRequest
> > API is a AnalyzerDefinition. I haven't found any Solr code that allows me
> > to transform that directly into an analyzer. If i had that, it would make
> > things easy.
> >
> > As far as i see it, i need to reconstruct a real Analyzer using
> > AnalyzerDefinition's information. It won't be a problem, but it is
> > cumbersome.
> >
> > Thanks anyway,
> > Markus
> >
> > -----Original message-----
> > > From:David Smiley <david.w.smiley@gmail.com>
> > > Sent: Thursday 5th April 2018 19:38
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: PreAnalyzed URP and SchemaRequest API
> > >
> > > Is this really a problem when you could easily enough create a TextField
> > > and call setTokenStream?
> > >
> > > Does your remote client have Solr-core and all its dependencies on the
> > > classpath?   That's one way to do it... and presumably the direction you
> > > are going because you're asking how to work with PreAnalyzedParser which
> > is
> > > in solr-core.  *Alternatively*, only bring in Lucene core and construct
> > > things yourself in the right format.  You could copy PreAnalyzedParser
> > into
> > > your codebase so that you don't have to reinvent any wheels, even though
> > > that's awkward.  Perhaps that ought to be in Solrj?  But no we don't want
> > > SolrJ depending on Lucene-core, though it'd make a fine "optional"
> > > dependency.
> > >
> > > On Wed, Apr 4, 2018 at 4:53 AM Markus Jelsma <markus.jelsma@openindex.io
> > >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We intend to move to PreAnalyzed URP for analysis offloading. Browsing
> > the
> > > > Javadocs i came across the SchemaRequest API looking for a way to get
a
> > > > Field object remotely, which i seem to need for
> > > > JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get
> > from
> > > > SchemaRequest API is FieldTypeRepresentation, which offers me
> > > > getIndexAnalyzer() but won't allow me to construct a Field object.
> > > >
> > > > So, to analyze remotely i do need an index-time analyzer. I can get it,
> > > > but not turn it into a Field object, which the PreAnalyzedParser for
> > some
> > > > reason wants.
> > > >
> > > > Any hints here? I must be looking the wrong way.
> > > >
> > > > Many thanks!
> > > > Markus
> > > >
> > > --
> > > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> > > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> > > http://www.solrenterprisesearchserver.com
> > >
> >
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
> 

Mime
View raw message