lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan McKinley (JIRA)" <>
Subject [jira] Commented: (SOLR-477) AnalysisRequestHandler
Date Tue, 12 Feb 2008 05:22:08 GMT


Ryan McKinley commented on SOLR-477:

I admit I don't fully understand the interplay between the other writers (JSON, etc.) so help
would be appreciated there.

essentially the types supported by TextResponseWriter are automatically supported by the standard
writers.  Check line 109 writeVal() in:

As for a SearchComponent piece, I'd like to hear more.  Does the SearchComponent  piece handle
ContentStreams?  That is, could I just send my <add>...</add> to it and it would
spit out the tokens?  On the query side of things, I think it would be useful to see how the
query is analyzed, so that makes sense in a SearchComponent.  Perhaps we can find common code?

No ContentStreams in the version I'm working with.  I am analyzing stored fields so the client
can link directly to a valid 'filter'.  To see it in action, check:

Note how the subject line gets split into linkable tokens.  Check that stored content "Mass."
actually links to "/subject:Massachusetts/"

I've also found this really useful for debugging what tokens exist for given fields -- of
course it only works for stored fields.

After you finish the handler version, I'll see what can be shared.

> AnalysisRequestHandler
> ----------------------
>                 Key: SOLR-477
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: SOLR-477.patch
> Being able to programmatically access tokenization information can be quite useful not
only in Solr, but in other NLP applications where token vectors are necessary.
> The patch to follow creates an AnalysisRequestHandler which processes a document through
the analysis process and returns a response filled with tokens, their offsets, position inc.,
type and value.
> Patch also adds some character array processing to Xml and adds Token handling to XMLWriter.
> I only implemented Xml output, as I don't know JSON or the other types.  If someone else
is so motivated, they can add those.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message