tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <>
Subject Re: Where can I store data files in a tomcat war
Date Wed, 02 Jul 2014 15:34:49 GMT
Hash: SHA256


On 7/2/14, 6:49 AM, Paul Taylor wrote:
> [L]et me explain it a bit further. I'm trying to deploy an 
> application that serves results from a lucene index in response to
> user requests. Deploying it manually to my own server is fine,
> first of all I just copy the index files to a location on the disk,
> then I deploy my application, and within its web.xml I have a
> servlet parameter that defines where the indexes are, so within the
> servlets init() method i initilize the indexes. The problem is that
> I'm trying to deploy my application to Amazon Web Services using
> autoscaled Elastic Beanstalk, this means that the application has
> to be able to be initilized and created based on what is in the war
> because Elastic Beanstalk will automatically start new servers as
> required due to load and terminate those instances when not
> required.
> I do seem to have a solution, but I detail it here because it
> doesn't seem quite right and might be useful to others.
> Short Answer: Originally I first tried putting the index files
> (unzipped) into the src/main/resources folder of my maven project,
> and referred to the WEB-INF/classes/index_dir location in my
> web.xml and tomcat didn't start. It didnt seem right for non Java
> classes to be in that folder anyway so I discarded that idea,
> however Ive just tried it again locally and it worked so if it
> works on EB that is the solution I'm going to use for now unless
> any better suggestions. It does mean that the resulting .war file
> is rather  large, far too large to upload from my local machine but
> as I build the code and indexes from another AWS EC2 instance I can
> just dump it into S3, and deploy from S3 to EB, if I need to
> redeploy you dont seem able to redeploy from S3 but Ive realised
> that when I need to redeploy I would do it to a new EB
> configuration and then swap the dns from EB1 to EB2 to mimimize
> downtime so that is not really a problem.
> A supplementary question: Is there a system property I can use to
> refer to the WEB-INF as a relative directory rather than full path

Don't use paths. Use the ClassLoader if Lucene can really load a file
in that way.

The problem is that you can't rely on EB to expand your WAR file on
the disk. If EB suddenly changes its deployment model to stop
expanding your WAR file, then you are hosed and your application won't
work at all.

Instead, you need to work around the problem. Let me restate the
problem so the solution makes more sense:

1. Amazon Elastic Beanstalk requires a WAR file to deploy to a cluster
2. Lucene can't read an index out of a WAR file

The solution is that the web application, packaged in a WAR file,
needs to unpack the Lucene indexes onto the disk when it starts up.
You can do this with a ServletContextListener.

Since you expand the files, you decide where to put them. The servlet
spec guarantees a temporary directory available using
application.getAttribute("javax.servlet.context.tempdir"). This
returns a object pointing to the temporary directory for
the application. Dump your files in there (a subdirectory would be a
good idea) and then point Lucene at that place on the disk.

> Long Answer: Since originally  posting this question I have looked
> at a few other possible solutions but none were satisfactory.
> 1. Deploy war without indexes but in my servlet init() method write
> code to grab the compressed indexes from S3 and unzip to location
> specified in web.xml.

That would work, too, but you'll have to "pay" for download time for
each member of the cluster. If you pack the indexes in the WAR file,
they are already available when the webapp initializes.

> 2. Deploy war without indexes and use AWS .ebextensions files to
> grab and unzip the indexes. This might work but I really dislike
> having to write custom deployment code/configurations as a general
> rule. And because the size of the disk provided by the AWS
> instance is limited, unzipping is not so simple. For example
> instead of creating a tar.gz file , I had to gzip the files first
> and then tar so when untarrred I could decompress one file at a
> time which required less temporaray space, this would make the eb
> code more complex.

Neither tar nor gzip take very much of anything: they are both
block-oriented. What procedure were you using to decompress the
tarballs? Decompressing the entire tarball and then tearing it apart
is a mistake: you should chain the processes together so you read from
the tarball and write individual, uncompressed files to the disk.

> 3. Create a custom Amazon Image that can be used by EB, this seems 
> theoretically possible but quickly got very messy and seemed very
> much a hack.

It's a huge amount of work and the point is to give a WAR to AEB and
"just do it".

> 4. Use Docker, AWS now supports the docker framework. This might be
> a good solution  but having spent far too much time on
> understanding AWS I wasnt keen to spen dmore time on yet another
> framework to solve one problem

I don't know anything about docker but it seems to me the problem is
the availability of the index and no other product/framework is going
to help you with that.

There is another option: stick the master index on an EBS store and
mount the EBS store on the target machine. IIRC, EBS volumes can't be
shared (which is a big pain IMO) so you can't mount that disk on all
of your Lucene servers... you might have to mount the EBS store, copy
the indexes, and then unmount the store. You'd only have to do this
once each time you wanted to launch an additional instance or update
the index.

Or, you could look into Solr which I believe understands clustering.
Then, you load the index onto the cluster and do whatever you want
with it.

- -chris
Version: GnuPG v1
Comment: GPGTools -
Comment: Using GnuPG with Thunderbird -


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message