lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: UpdateRequestProcessor : flattened values
Date Mon, 28 May 2012 14:30:03 GMT
"... the access to individual literal fields seems (currently) very limited 
as they appear to be flattened."

That is s "feature" of SolrCell, to flatten multiple values for a 
non-multi-valued field into a string concatenation of the values.

All you need to do is add "multiValued="true"" to the "author" field in your 
schema.xml:

<field name="author" type="text_general" indexed="true" stored="true"/>

becomes

<field name="author" type="text_general" indexed="true" stored="true" 
multiValued="true"/>

-- Jack Krupansky

-----Original Message----- 
From: Raphaƫl
Sent: Monday, May 28, 2012 7:17 AM
To: solr-user@lucene.apache.org
Subject: Re: UpdateRequestProcessor : flattened values

On Sun, May 27, 2012 at 11:54:02PM -0400, Jack Krupansky wrote:
> You can create your own "update processor" that gets control between the
> output of Tika and the indexing of the document.
>
> See:
> http://wiki.apache.org/solr/UpdateRequestProcessor

Seems to be exactly what I was looking for, thanks a lot !

I just started an (almost working) implementation but I've one notice:

Let's get a field valueS:
> Collection v = doc.getFieldValues( "author" );
( in my `processAdd(AddUpdateCommand cmd)` )

and push a doc, say using:
> `curl -F content=@my.pdf -F literal.author=a -F literal.author=b -F 
> literal.author="c d"`

Then `log.warn("author: " + v + ":" + v.size());` throws:
> WARN: author: [pdfauthor, a b c d] : 2

It's not (yet) a blocker in my personal case but I fear it's important
enough to be noted: using a custom UpdateRequestProcessor, the access to
individual literal fields seems (currently) very limited as they appear
to be flattened. I'm quite sure there should already an hidden bug report
about this somewhere.


Other than that and unless I hit some other unexpected issue, this way
to customize the request processor perfectly suits my needs.


thanks ! 


Mime
View raw message