Thanks I will try that
There is also a different way to do this entirely - there is a path attribute you can send as metadata to Solr. Just include the entire path, and put it into a different field that you declare in your schema. See "path attribute" in the end-user documentation for the JCIFS connector.
On Wed, May 1, 2013 at 8:52 AM, Karl Wright <email@example.com> wrote:
KarlThe alternative is, since the encoding IS deterministic and reversible, you could readily write a Tika plugin that would modify at least the URL field in the manner you desire. But you could not modify the ID field since ManifoldCF uses this to delete documents that have disappeared.IE 6 is extremely old and I believe we developed for IE 7 at a minimum (there were two different versions with different functionality we had to support there), and made further changes for IE 8 when it came out. I have no idea what IE 9 or IE 10 do.The only way to change the encoding of the IRI is to modify the JCIFS connector code. But please bear in mind that unless you can show your modifications will work across a wide variety of browsers, we are unlikely to accept these changes back into the code base.
On Wed, May 1, 2013 at 8:45 AM, Yossi Nachum <firstname.lastname@example.org> wrote:
The IRI is not working in my IE. I am using old version of IE V6 SP3.
But what I realy want is to display the correct name of the path with hebrew characters.
If I understand you right, then I need to change the representation of the IRI. How can I do that?On May 1, 2013 3:14 PM, "Karl Wright" <email@example.com> wrote:Right, that is exactly what I would expect.ManifoldCF uses a URL (which is constructed by the connector) as the primary key for every document as indexed in the search engine. The URL has two purposes: first, it is supposed to be unique, and second, it is supposed to allow someone who browses to that result to locate the document. In the case of JCIFS, the environment is presumed to be the local active directory domain(s), and the "URL" generated is really a file IRI, usually of the form "file://///server.domain/path/filename". You thus should be able to paste the "URL" of the document from Solr into a browser on a machine in the domain, and see the document load.
As I said before, however, there are already certain problems with this because each version of IE differs somewhat in how it deals with non-ASCII characters. IRI legal character rules are somewhat different than URL rules, but IRI's are still nevertheless escaped in various ways. There are also multiple equivalent ways of representing the same file path with different IRI's.
It is not typical that the ID and URL fields of a document are presented to the user in any meaningful way, so your question is usually academic in most settings. If you have a problem with the IRI's not actually working in a browser, that's of more immediate interest. Please let us know if that's the case.
Thanks,KarlOn Wed, May 1, 2013 at 8:04 AM, Yossi Nachum <firstname.lastname@example.org> wrote:
Thanks for your response
I am seeing these characters in solr when I search these files.
I am using the solr example site and these characters show up in the ID field and URL field.
BTW I am running solr and mcf on a linux server
On May 1, 2013 1:11 PM, "Karl Wright" <email@example.com> wrote:Where are you seeing these characters? Are you talking about the file IRI's that the JCIFS connector generates? Those IRI's are supposed to be constructed so that your browser would find them if you paste them into the browser URL window. Unfortunately, there is no good standard, and people follow IE's behavior, and IE has changed multiple times in how it deals with non-latin-1 characters.Please provide a bit more information so that we can provide a better answer.
KarlOn Wed, May 1, 2013 at 3:11 AM, Yossi Nachum <firstname.lastname@example.org> wrote:
I install search server with solr and manifoldcf.
I want to index my netapp files over cifs and I have a problem with hebrew files and directories.
When I search for these files in solr I see "%D7%91%D7%..." instead of the directory path that contain hebrew characters .
I try to run the java process with "-Djcifs.encoding=cp1255" but it didn't help.
Can anyone help and tell me how can I index directories/files in hebrew?