lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan <jan.rich...@dsto.defence.gov.au>
Subject Populating a custom Solr field with text extracted from document
Date Thu, 17 Nov 2011 05:42:28 GMT
Hi all,

I am a new Solr user, and would like to create a new custom field that is
then populated with text extracted from each document when I crawl my file
system.

For example, I have a bunch of text documents that contain the following
format:
text text text... Received : 04 Jan 2002 17:31:40 ...text text text

I would like to store the value "04 Jan 2002 17:31:40" in my custom field.
The new field has already been created in schema.xml as follows:
<field indexed="true" multiValued="false" name="received" omitNorms="true"
omitPositions="true" omitTermFreqAndPositions="true" stored="true"
termVectors="false" type="text_en" />

I am unsure how to populate this field during the crawl of my data sources.
I looked into analyzers and tokenizers 
http://wiki.apache.org/solr/SolrPlugins#Fields here , and also looked at
SolrCell/ExtractingRequestHandler but from what I understand neither of
these are the correct solution.

Can anyone give me some guidance? Apologies if this has already been
answered elsewhere.

Thank you,
Jan

--
View this message in context: http://lucene.472066.n3.nabble.com/Populating-a-custom-Solr-field-with-text-extracted-from-document-tp3514857p3514857.html
Sent from the Lucene - General mailing list archive at Nabble.com.

Mime
View raw message