lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-284) Parsing Rich Document Types
Date Sat, 27 Jun 2009 16:58:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724871#action_12724871
] 

Yonik Seeley commented on SOLR-284:
-----------------------------------

Apologies for not reviewing this sooner after it was committed - but this is the last/best
chance to improve the interface before 1.4 is released (and this is very important new functionality).

Since the "ext." seems unnecessary and removing is already a name change, we might as well
revisit the names themselves anyway.  Here are my first thoughts on it:
{code}
//////// generic type stuff that could be reused by other update handlers
boost.myfield=2.3
literal.myfield=Hello
map.origfield=newfield
uprefix=attr_ 
  // map any unknown fields using a standard prefix... good for
  // dynamic field mapping.

//////// more solr cell specific
capture.target_field=div
  // does capture + field-map in single step... avoids name clashes
xpath=xpath_expr
  // future: could do xpath.targetfield=xpath_expr
extract_only=true  // period's aren't word separators, but scoping operators
 // in the future, this could be replaced with a generic update operation
 // to return the document(s) instead of indexing them.
resource.name=test.pdf

New idea:
  nicenames=true // Last-Modified -> last_modified


REMOVED:
ext.ignore.und.fl 
  // throwing an exception when a field-type doesn't exist is generic
  // and not needed.  we should never silently ignore.
ext.idx.attr
  // do we ever want this to be false?  we can ignore all attributes
  // with field mappings if we want to
ext.metadata.prefix
  // seems like we only want to map unknown fields, not all fields
ext.def.fl 
  // we can use a standard field name for indexing main content
  // and use map to move it if desired. "content"? 
{code}

Do people view this as an improvement?

> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch,
rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip,
test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports
streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr.
> There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message