lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (SOLR-284) Parsing Rich Document Types
Date Sat, 27 Jun 2009 16:58:47 GMT


Yonik Seeley commented on SOLR-284:

Apologies for not reviewing this sooner after it was committed - but this is the last/best
chance to improve the interface before 1.4 is released (and this is very important new functionality).

Since the "ext." seems unnecessary and removing is already a name change, we might as well
revisit the names themselves anyway.  Here are my first thoughts on it:
//////// generic type stuff that could be reused by other update handlers
  // map any unknown fields using a standard prefix... good for
  // dynamic field mapping.

//////// more solr cell specific
  // does capture + field-map in single step... avoids name clashes
  // future: could do xpath.targetfield=xpath_expr
extract_only=true  // period's aren't word separators, but scoping operators
 // in the future, this could be replaced with a generic update operation
 // to return the document(s) instead of indexing them.

New idea:
  nicenames=true // Last-Modified -> last_modified

  // throwing an exception when a field-type doesn't exist is generic
  // and not needed.  we should never silently ignore.
  // do we ever want this to be false?  we can ignore all attributes
  // with field mappings if we want to
  // seems like we only want to map unknown fields, not all fields
  // we can use a standard field name for indexing main content
  // and use map to move it if desired. "content"? 

Do people view this as an improvement?

> Parsing Rich Document Types
> ---------------------------
>                 Key: SOLR-284
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>         Attachments:, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch,
rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch,
SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf,,,,, un-hardcode-id.diff
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports
streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr.
> There is a wiki page with information here:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message