lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Lundgren (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10981) Allow update to load gzip files
Date Wed, 18 Jul 2018 19:10:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Lundgren updated SOLR-10981:
-----------------------------------
    Description: 
We currently import large CSV files. We store them in gzip files as they compress at around
80%.

To import them we must gunzip them and then import them. After that we no longer need the
decompressed files.

This patch allows directly opening either URL, or local files that are gzipped.

For URLs, to determine if the file is gzipped, it will check the content encoding=="gzip"
or if the file ends in ".gz"

For files, if the file ends in ".gz" then it will assume the file is gzipped.

I have tested the patch with 4.10.4, 6.6.0, 7.0.1 and master from git.

  was:
We currently import large CSV files.  We store them in gzip files as they compress at around
80%.

To import them we must gunzip them and then import them.  After that we no longer need the
decompressed files.

This patch allows directly opening either URL, or local files that are gzipped.

For URLs, to determine if the file is gzipped, it will check the content encoding=="gzip"
or if the file ends in ".gz"

For files, if the file ends in ".gz" then it will assume the file is gzipped.

I have tested the patch with 4.10.4, 6.6.0 and master from git.


> Allow update to load gzip files 
> --------------------------------
>
>                 Key: SOLR-10981
>                 URL: https://issues.apache.org/jira/browse/SOLR-10981
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 6.6
>            Reporter: Andrew Lundgren
>            Priority: Major
>              Labels: patch
>             Fix For: master (8.0), 7.5
>
>         Attachments: SOLR-10981.patch, SOLR-10981.patch, SOLR-10981.patch, SOLR-10981.patch
>
>
> We currently import large CSV files. We store them in gzip files as they compress at
around 80%.
> To import them we must gunzip them and then import them. After that we no longer need
the decompressed files.
> This patch allows directly opening either URL, or local files that are gzipped.
> For URLs, to determine if the file is gzipped, it will check the content encoding=="gzip"
or if the file ends in ".gz"
> For files, if the file ends in ".gz" then it will assume the file is gzipped.
> I have tested the patch with 4.10.4, 6.6.0, 7.0.1 and master from git.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message