tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Where can I store data files in a tomcat war
Date Wed, 02 Jul 2014 15:34:49 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Paul,

On 7/2/14, 6:49 AM, Paul Taylor wrote:
> [L]et me explain it a bit further. I'm trying to deploy an 
> application that serves results from a lucene index in response to
> user requests. Deploying it manually to my own server is fine,
> first of all I just copy the index files to a location on the disk,
> then I deploy my application, and within its web.xml I have a
> servlet parameter that defines where the indexes are, so within the
> servlets init() method i initilize the indexes. The problem is that
> I'm trying to deploy my application to Amazon Web Services using
> autoscaled Elastic Beanstalk, this means that the application has
> to be able to be initilized and created based on what is in the war
> because Elastic Beanstalk will automatically start new servers as
> required due to load and terminate those instances when not
> required.
> 
> I do seem to have a solution, but I detail it here because it
> doesn't seem quite right and might be useful to others.
> 
> Short Answer: Originally I first tried putting the index files
> (unzipped) into the src/main/resources folder of my maven project,
> and referred to the WEB-INF/classes/index_dir location in my
> web.xml and tomcat didn't start. It didnt seem right for non Java
> classes to be in that folder anyway so I discarded that idea,
> however Ive just tried it again locally and it worked so if it
> works on EB that is the solution I'm going to use for now unless
> any better suggestions. It does mean that the resulting .war file
> is rather  large, far too large to upload from my local machine but
> as I build the code and indexes from another AWS EC2 instance I can
> just dump it into S3, and deploy from S3 to EB, if I need to
> redeploy you dont seem able to redeploy from S3 but Ive realised
> that when I need to redeploy I would do it to a new EB
> configuration and then swap the dns from EB1 to EB2 to mimimize
> downtime so that is not really a problem.
> 
> A supplementary question: Is there a system property I can use to
> refer to the WEB-INF as a relative directory rather than full path

Don't use paths. Use the ClassLoader if Lucene can really load a file
in that way.

The problem is that you can't rely on EB to expand your WAR file on
the disk. If EB suddenly changes its deployment model to stop
expanding your WAR file, then you are hosed and your application won't
work at all.

Instead, you need to work around the problem. Let me restate the
problem so the solution makes more sense:

1. Amazon Elastic Beanstalk requires a WAR file to deploy to a cluster
2. Lucene can't read an index out of a WAR file

The solution is that the web application, packaged in a WAR file,
needs to unpack the Lucene indexes onto the disk when it starts up.
You can do this with a ServletContextListener.

Since you expand the files, you decide where to put them. The servlet
spec guarantees a temporary directory available using
application.getAttribute("javax.servlet.context.tempdir"). This
returns a java.io.File object pointing to the temporary directory for
the application. Dump your files in there (a subdirectory would be a
good idea) and then point Lucene at that place on the disk.

> Long Answer: Since originally  posting this question I have looked
> at a few other possible solutions but none were satisfactory.
> 
> 1. Deploy war without indexes but in my servlet init() method write
> code to grab the compressed indexes from S3 and unzip to location
> specified in web.xml.

That would work, too, but you'll have to "pay" for download time for
each member of the cluster. If you pack the indexes in the WAR file,
they are already available when the webapp initializes.

> 2. Deploy war without indexes and use AWS .ebextensions files to
> grab and unzip the indexes. This might work but I really dislike
> having to write custom deployment code/configurations as a general
> rule. And because the size of the disk provided by the AWS
> instance is limited, unzipping is not so simple. For example
> instead of creating a tar.gz file , I had to gzip the files first
> and then tar so when untarrred I could decompress one file at a
> time which required less temporaray space, this would make the eb
> code more complex.

Neither tar nor gzip take very much of anything: they are both
block-oriented. What procedure were you using to decompress the
tarballs? Decompressing the entire tarball and then tearing it apart
is a mistake: you should chain the processes together so you read from
the tarball and write individual, uncompressed files to the disk.

> 3. Create a custom Amazon Image that can be used by EB, this seems 
> theoretically possible but quickly got very messy and seemed very
> much a hack.

It's a huge amount of work and the point is to give a WAR to AEB and
"just do it".

> 4. Use Docker, AWS now supports the docker framework. This might be
> a good solution  but having spent far too much time on
> understanding AWS I wasnt keen to spen dmore time on yet another
> framework to solve one problem

I don't know anything about docker but it seems to me the problem is
the availability of the index and no other product/framework is going
to help you with that.

There is another option: stick the master index on an EBS store and
mount the EBS store on the target machine. IIRC, EBS volumes can't be
shared (which is a big pain IMO) so you can't mount that disk on all
of your Lucene servers... you might have to mount the EBS store, copy
the indexes, and then unmount the store. You'd only have to do this
once each time you wanted to launch an additional instance or update
the index.

Or, you could look into Solr which I believe understands clustering.
Then, you load the index onto the cluster and do whatever you want
with it.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJTtCaRAAoJEBzwKT+lPKRYXk4QAJt0uxHfk/yrwAi+8DWqUtUh
Jwoz0esycw1ooFiRfie2wVH5798TD5qGcbUpk5q0NHL+85cShbJWg4Hb6RLA8FfD
jvlNXdLGRveLcxW7K9vhCt8/G9QNv7ZiH7zYQAMRgaXIrNC5FaOBs6Bt4TO1L9sm
8YIBo5Cv5K1YGMz9pNAw8AXY3vINorar0xs2CdW9tBcNptlk655jnjz4qMYJga7p
B0Fs38FqECjNv6vQ1UW4QLKfImIeSvTpEqfait18Aw37Q6LnYuUGdfFctsKN/qO7
64y/NzhZwVxgNfrAFLc5Dz/0rJUxMJS05x/IQ4PtMtHjZYwRUrE+YhtayxggMVUj
T68ogi5/wInHN6qKqhPKp3WBd+IBlqmkhX17X2h7YzUJ/Nk3BDwCQv0dNq7E8Q55
e2tbyzk7W7oF1tTcuAp3QrINzogl6ajorsh4YCbX8KQmxNdSbOuRIlcBtlqLPgTp
vuVmF2vK481apwocrTyYfI01xquOs/rcEWXYTogBGKcnDnWaIaMCXTPug1uWEY6N
V9sfmRIRNoRn6n55DbCjPlOqzltzkHti1LzHw80Egpj3vNM6rjDYumWWF7nTpV2G
SUeGvlmUUekCjKP/OSeOmHNvH/JNGBbkZj2GjlUUdluIvgPN55KmLBp+j8nOTYEk
B2mlbS8pgr9b75UMltBV
=DHTX
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message