manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Zapczynski" <>
Subject Need examples of expressions used to specify multiple folders to index
Date Thu, 19 Mar 2015 19:07:20 GMT
Hello all.   I am using ManifoldCF to index a Windows share containing well over 160,000 files
(.xls, .pdf, .doc).   I keep getting memory errors when I try to index the whole folder at
once and have not been able to resolve this by throwing memory and CPU at Tomcat and the VM,
so I thought I'd try this a different way.
What I'd like to do now is break what was a single job up into multiple jobs.   Each job should
index all indexable files under a parent folder, with one job indexing folders whose names
begin with the letters A-G as well as all subfolders and files within, another job for H-M
also with all subfolders/files, and so on.   My problem is, somehow I can't manage to figure
out what expression to use to get it to index what I want.    
In the Job settings under Paths, I have specified the parent folder, and within there I've
1.  Include file(s) or directory(s) matching *  (this works, but indexes every file in every
folder within the parent, eventually causing me unresolvable GC memory overhead errors)
2.  Include file(s) or directory(s) matching ^(?i)[A-G]*  (this does not work; it supposedly
indexes one file and then quits)
3.  Include file(s) or directory(s) matching A* (this does not work; it supposedly indexes
one file and then quits, and there are many folders directly under the parent that begin with
Can anyone help confirm what type of expression I should use in the paths to accomplish what
I want? 
Or alternately if you think I should be able to index 160,000+ files in one job without getting
GC memory overhead errors, I'm open to hear your suggestions on resolving those.   All I know
to do is increase the maximum memory in Tomcat as well as on the OS, and that didn't help
at all.  
Thanks much!

View raw message