manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shigeki Kobayashi <shigeki.kobayas...@g.softbank.co.jp>
Subject Re: Rules of excluding specific files in Windows file server are not recognized
Date Wed, 12 Sep 2012 02:11:50 GMT
I found the reason that  MCF job does not recognize the file name to
exclude from crawling.
You need to put a slash character follwoing by a file name.

I obtained a log below. This time I had a root directory,
//xxxxx/SharePrjG2/xxxxx/sug/, then placed a file named as "phs.txt".
In the job setting, I entered "phs.txt" to exclude the file from crawling,
so the crawling rule became as follwoing:

  1. Exclude file(s) matching phs.txt

DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Matching
startpoint 'smb://xxxxx/SharePrjG2/xxxxx/sug/' against actual
'smb://xxxxx/SharePrjG2/xxxxx/sug/'
DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Startpoint
found!
DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: Checking
'phs.txt' against '/phs.txt'
DEBUG 2012-09-12 09:56:35,829 (Worker thread '31') - JCIFS: No match!


The third line above tells "phs.txt" does not match "/phs.txt".

Well, I feel it is kind of hard for users to find out you need a slash.
If this specification is going to be kept, I think it would be nice to
specify this rule in the user documentation.

Thanks for your help.


Regards,


Shigeki

2012/9/11 Karl Wright <daddywri@gmail.com>

> I am wondering if there might be another locale-specific toLowerCase()
> issue like we saw in Turkey...
>
> I've asked Shigeki to turn on connector debugging and send us the log.
>  That should demonstrate if the rule is not matching due to case
> reasons.
>
> Karl
>
> On Tue, Sep 11, 2012 at 7:44 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
> > Hi Shigeki
> >
> > Can you try entering "*text.txt" in the text box?
> >
> > Ahmet
> > --- On Tue, 9/11/12, Shigeki Kobayashi <
> shigeki.kobayashi3@g.softbank.co.jp> wrote:
> >
> > From: Shigeki Kobayashi <shigeki.kobayashi3@g.softbank.co.jp>
> > Subject: Rules of excluding specific files in Windows file server are
> not recognized
> > To: user@manifoldcf.apache.org
> > Date: Tuesday, September 11, 2012, 1:46 PM
> >
> > Hi guys.
> > I need some help in excluding specific files from crawling.
> > I am trying to crawl Windows file server using Windows shares connector
> to index to Solr.
> >
> > There are some files I do not want to index so I set paths to exclude
> them from crawling, but the job crawls them.
> > For example, I do NOT want to index "text.txt" in a directory D which is
> a root path.
> >
> >
> > In "Paths" tab: - Set D as the root path.  - To create crawling rules,
> from pulldown, chose "exclude" and "file", and enter "text.txt" in a text
> box.
> >
> > - The list of crawling rules is created as following:
> >   1. Exclude file(s) matching text.txt   2. Include indexable file(s)
> matching *  3. Include directory(s) matching *
> >
> >
> > - Save the job setting
> > As the result, the job still tries to crawl the file.I wonder why
> "text.txt" does not match in the crawling rule.
> >
> >
> > Anyone knows what I did wrong?
> > Version:  MCF 0.5  Solr 3.5  MySql 5.5
> >
> > Regards,
> > Shigeki
> >
> >
>



-- 
*~~~~~~~~~~~~~~~~~~~~**~~~~*
 ソフトバンクモバイル株式会社
 情報システム本部
 システムサービス事業統括部
 サービス企画部

 小林 茂樹
 shigeki.kobayashi3@g.softbank.co.jp
*~~~~~~~~~~~~~~~~~~~~**~~~~*

Mime
View raw message