tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: Where can I store data files in a tomcat war
Date Wed, 02 Jul 2014 20:28:58 GMT
On 02/07/2014 16:34, Christopher Schultz wrote:
> Hash: SHA256
> Paul,
> On 7/2/14, 6:49 AM, Paul Taylor wrote:
>> [L]et me explain it a bit further. I'm trying to deploy an
>> application that serves results from a lucene index in response to
>> user requests. Deploying it manually to my own server is fine,
>> first of all I just copy the index files to a location on the disk,
>> then I deploy my application, and within its web.xml I have a
>> servlet parameter that defines where the indexes are, so within the
>> servlets init() method i initilize the indexes. The problem is that
>> I'm trying to deploy my application to Amazon Web Services using
>> autoscaled Elastic Beanstalk, this means that the application has
>> to be able to be initilized and created based on what is in the war
>> because Elastic Beanstalk will automatically start new servers as
>> required due to load and terminate those instances when not
>> required.
>> I do seem to have a solution, but I detail it here because it
>> doesn't seem quite right and might be useful to others.
>> Short Answer: Originally I first tried putting the index files
>> (unzipped) into the src/main/resources folder of my maven project,
>> and referred to the WEB-INF/classes/index_dir location in my
>> web.xml and tomcat didn't start. It didnt seem right for non Java
>> classes to be in that folder anyway so I discarded that idea,
>> however Ive just tried it again locally and it worked so if it
>> works on EB that is the solution I'm going to use for now unless
>> any better suggestions. It does mean that the resulting .war file
>> is rather  large, far too large to upload from my local machine but
>> as I build the code and indexes from another AWS EC2 instance I can
>> just dump it into S3, and deploy from S3 to EB, if I need to
>> redeploy you dont seem able to redeploy from S3 but Ive realised
>> that when I need to redeploy I would do it to a new EB
>> configuration and then swap the dns from EB1 to EB2 to mimimize
>> downtime so that is not really a problem.
>> A supplementary question: Is there a system property I can use to
>> refer to the WEB-INF as a relative directory rather than full path
> Don't use paths. Use the ClassLoader if Lucene can really load a file
> in that way.
> The problem is that you can't rely on EB to expand your WAR file on
> the disk. If EB suddenly changes its deployment model to stop
> expanding your WAR file, then you are hosed and your application won't
> work at all.
Lucene works on files and does low level io memory mapping so I do need 
to use paths, but anyway it doesnt matter because as describe din my 
last post EB doesn't allow me to have a war file big enough to hold the 
index files anyway.
> Instead, you need to work around the problem. Let me restate the
> problem so the solution makes more sense:
> 1. Amazon Elastic Beanstalk requires a WAR file to deploy to a cluster
> 2. Lucene can't read an index out of a WAR file
> The solution is that the web application, packaged in a WAR file,
> needs to unpack the Lucene indexes onto the disk when it starts up.
> You can do this with a ServletContextListener.
So I do within init() method of my servlet, but EB doesnt wait for the 
init() method to finish before declaring the application ready, do you 
think it would wait for code using a ServletContextListener or fail in 
the same way it does for init() ?
> Since you expand the files, you decide where to put them. The servlet
> spec guarantees a temporary directory available using
> application.getAttribute("javax.servlet.context.tempdir"). This
> returns a object pointing to the temporary directory for
> the application. Dump your files in there (a subdirectory would be a
> good idea) and then point Lucene at that place on the disk.
>> Long Answer: Since originally  posting this question I have looked
>> at a few other possible solutions but none were satisfactory.
>> 1. Deploy war without indexes but in my servlet init() method write
>> code to grab the compressed indexes from S3 and unzip to location
>> specified in web.xml.
> That would work, too, but you'll have to "pay" for download time for
> each member of the cluster. If you pack the indexes in the WAR file,
> they are already available when the webapp initializes.
See my later posts, it doesn't work because of problem with EB not 
respecting finish of init(), and I cant pack the indexes into WAR 
because breaks Amazons max war size of 1/2 GB

>> 2. Deploy war without indexes and use AWS .ebextensions files to
>> grab and unzip the indexes. This might work but I really dislike
>> having to write custom deployment code/configurations as a general
>> rule. And because the size of the disk provided by the AWS
>> instance is limited, unzipping is not so simple. For example
>> instead of creating a tar.gz file , I had to gzip the files first
>> and then tar so when untarrred I could decompress one file at a
>> time which required less temporaray space, this would make the eb
>> code more complex.
> Neither tar nor gzip take very much of anything: they are both
> block-oriented. What procedure were you using to decompress the
> tarballs? Decompressing the entire tarball and then tearing it apart
> is a mistake: you should chain the processes together so you read from
> the tarball and write individual, uncompressed files to the disk.
With the java solution I was using

|import  org.rauschig.jarchivelib.Archiver;
import  org.rauschig.jarchivelib.ArchiverFactory;
File  indexDirFile=  new  File(indexDirParent).getAbsoluteFile();
Archiver  archiver=  ArchiverFactory.createArchiver(largeFile);
archiver.extract(largeFile,  indexDirFile);

which is a library around Apache Compress, and that did create a temporary tar file

But maybe if using linux commands directly I wont hit the problem.I think using .ebextensions
is now myt best chance of getting something working.


>> 3. Create a custom Amazon Image that can be used by EB, this seems
>> theoretically possible but quickly got very messy and seemed very
>> much a hack.
> It's a huge amount of work and the point is to give a WAR to AEB and
> "just do it".

>> 4. Use Docker, AWS now supports the docker framework. This might be
>> a good solution  but having spent far too much time on
>> understanding AWS I wasnt keen to spen dmore time on yet another
>> framework to solve one problem
> I don't know anything about docker but it seems to me the problem is
> the availability of the index and no other product/framework is going
> to help you with that.
I thought it might allow you to define indexes as part of the Docker 
image, but I don't want to open this can of worms.
> There is another option: stick the master index on an EBS store and
> mount the EBS store on the target machine. IIRC, EBS volumes can't be
> shared (which is a big pain IMO) so you can't mount that disk on all
> of your Lucene servers... you might have to mount the EBS store, copy
> the indexes, and then unmount the store. You'd only have to do this
> once each time you wanted to launch an additional instance or update
> the index.
But the whole point of Autoscaled EB deployments, is Amazon 
automatically starts additional servers if load gets heavy and 
terminates them if underused. I dont have to consciously make those 
decisions or be around, very useful if (as I suspect) Im going to have 
busy and quiet times during each 24 hour period. Maybe I could have 4 
EBS stores loaded (default max no of servers is 4) ready and then when 
server starts have some code in my init() method  to mount the next 
available(not mounted) EBS volume and use it. But I think this does been 
paying for four EBS stores all the time , and I dont know how to code 
for this because usually AFAIK the volumes have to be assigned to an EC2 
instance before the instance can mount them.

> Or, you could look into Solr which I believe understands clustering.
> Then, you load the index onto the cluster and do whatever you want
> with it.
I dont think Solr clustering would with EB autoscaling instead I would 
have to work directly with EC2 and forgo all the advantages of EB 
autoscaling, also I already have my code written and working I have no 
desire (or time)  to convert to Solr (or ElastcicSearch for that matter)


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message