前言

系统有的时候需要下载一些内容到本地,这些内容原来可能就是在服务器上某个位置或者离散分布,文件内容格式多样而且大小不一,本文将一步步来解决这些问题。

本文环境:

Python 2.7.10
Django 1.11

zip-file

安装

执行pip安装

pip install zipfile

一般python自带,不需要安装

打包完整目录

下面代码将完全打包当前目录,保存为abcd.zip文件里,存放到当前目录 .
存放在当前目录有个问题:压缩包里会再次包含压缩包,大小为0。所以可以将压缩包放到另外的目录。

文件zips.py

import zipfile
import os
import StringIO
def gen_zip_with_zipfile(path_to_zip, target_filename):    
    f = None
    try:
        f = zipfile.ZipFile(target_filename, 'w' ,zipfile.ZIP_DEFLATED)
        for root,dirs,files in os.walk(path_to_zip):
            for filename in files:
                f.write(os.path.join(root,filename))
            if len(files) == 0:
                zif=zipfile.ZipInfo((root+'\\'))
                f.writestr(zif,"")
    except IOError, message:
        print message
        sys.exit(1)
    except OSError, message:
        print message
        sys.exit(1)
    except zipfile.BadZipfile, message: 
        print message
        sys.exit(1)
    finally: 
        # f.close()
        pass
    #                
    if zipfile.is_zipfile(f.filename):
        print "Successfully packing to: "+os.getcwd()+"\\"+ target_filename
    else:
        print "Packing failed"

    return f

将压缩功能封装为以上函数,我把文件关闭注释掉了,否则后面有些操作不能进行,可以在这些操作都完成之后关闭文件。

import zipfile
import os 

tmpPath = ".\\document"
f = gen_zip_with_zipfile(tmpPath,'..\\abcd.zip')
# print f.namelist()
# print f.fp
f.close()

上面是一个调用的例子,通过上面操作,当前目录下的document文件夹包括子文件夹打包到abcd.zip文件里,存放到上一级目录

Zip文件添加到HttpResponse

import zipfile
import os
from zips import gen_zip_with_zipfile
def post(self, *args, **kwargs):
    path_to = ".\\document"
    f = gen_zip_with_zipfile(path_to,'..\\abcd.zip')
    f.close()
    # set response
    fread = open(f.filename,"rb")
    # response = HttpResponse(f.fp, content_type='application/zip')
    response = HttpResponse(fread, content_type='application/zip')
    response['Content-Disposition'] = 'attachment;filename="{0}"'.format("download.zip")
    fread.close()
    return response

不确定是否可以直接从f (zipfile) 读取内容而避免重复的额外再打开文件,尝试过 用f.fp,但是这个是写句柄。

使用临时文件代替固定文件

上面的操作中,我们总是会在服务器上生成一个文件,这些数据没有保存意义,必定会带来冗余,有什么办法可以解决这个问题吗?这一节我们尝试用临时文件来处理一下

FileWrapper在djagno1.9之后有了位置变化,记得更新

实际运行结果文件为空,不知道原因是什么,后面有时间会再查一查

def post(self, *args, **kwargs):
    # from django.core.servers.basehttp import FileWrapper
    from wsgiref.util import FileWrapper
    import tempfile
    temp = tempfile.TemporaryFile() 
    archive = zipfile.ZipFile(temp, 'w', zipfile.ZIP_DEFLATED) 
    archive.write(".\\document\\ckeditor.md") 
    archive.close() 
#
    wrapper = FileWrapper(temp) 
    response = HttpResponse(wrapper, content_type='application/zip') 
    response['Content-Disposition'] = 'attachment; filename=test.zip' 
    response['Content-Length'] = temp.tell() 
    temp.seek(0) 
    return response

使用内存代替文件

根据python里Zipfile的定义 https://docs.python.org/3/library/zipfile.html#zipfile-objects

class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True)
Open a ZIP file, where file can be a path to a file (a string), a file-like object or a path-like object.

所以我们可以用 BytesIO或者StringIO来代替固定文件

import StringIO
def post(self, *args, **kwargs):    
    # Files (local path) to put in the .zip
    # FIXME: Change this (get paths from DB etc)
    filenames = [".\\document\\ckeditor.md",]
    # Folder name in ZIP archive which contains the above files
    # E.g [thearchive.zip]/somefiles/file2.txt
    # FIXME: Set this to something better
    zip_subdir = "zipfolder"
    zip_filename = "%s.zip" % zip_subdir
    # Open StringIO to grab in-memory ZIP contents
    s = StringIO.StringIO()
    # The zip compressor
    zf = zipfile.ZipFile(s, "w")
    # 
    for fpath in filenames:
        # Calculate path for file in zip
        fdir, fname = os.path.split(fpath)
        zip_path = os.path.join(zip_subdir, fname)
        # Add file, at correct path
        zf.write(fpath, zip_path)
    # Must close zip for all contents to be written
    zf.close()
    # Grab ZIP file from in-memory, make response with correct MIME-type
    resp = HttpResponse(s.getvalue(), content_type='application/zip') 
    # ..and correct content-disposition
    resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
    return resp

上面内容的解释:

filenames里列出了即将被压缩的内容,这儿只是个简单的例子,实际项目中可以根据需要压缩整个文件夹或者选择性的离散压缩磁盘上的各个文件。

指定zip_subdir这个功能并不是必须的,如果指定了,那么压缩后的文件将会按指定目录结果放置文件,否则的话将会按压缩源的结果排列文件。比如在上面的例子上ckeditor.md放在document文件夹下面,默认压缩包里也是这个目录结果,如果如上面列子一样指定了放在zipfolder下面的话,那么压缩包的结构会变成zipfolder/ckeditor.md

有些地方的例子里,HttpResponse的参数为mimetype = "application/x-zip-compressed",这个在django1.5之后已经改成了content_type。

https://stackoverflow.com/questions/2463770/python-in-memory-zip-library
https://stackoverflow.com/questions/12881294/django-create-a-zip-of-multiple-files-and-make-it-downloadable

接下来这段代码中,文件的内容会根据queryset,从磁盘中获取,目标目录会截取相对目录

import StringIO
def post(self, *args, **kwargs):    
    s = StringIO.StringIO()
    zf = zipfile.ZipFile(s, "w") 
    zip_subdir = "media"
    qs = self.get_queryset()
    f = self.filter_class(self.request.GET, queryset=qs)
    for obj in f.qs:
        path = obj.image_before.path
        fdir, fname = os.path.split(path)
        zip_path = os.path.join(zip_subdir, path[len(settings.MEDIA_ROOT)+1:]) 
        zf.write(path, zip_path)
    zf.close()
    resp = HttpResponse(s.getvalue(), content_type='application/zip') 
    resp['Content-Disposition'] = 'attachment; filename=%s' % "daily_inspection_export.zip"
    return resp

加入临时生成文件

前面的例子中都将磁盘上的文件写入压缩包,但是如果有的文件是临时生成的,这种情况应该如何处理呢

下面代码中,在前面生成zip文件的基础上,在向里面添加一个daily_inspection_export.csv文件,这个文件在磁盘上并不存在,而且根据内容临时生成的

from .utils import gen_csv_file
import tempfile
def post(self, *args, **kwargs):
    import StringIO
    s = StringIO.StringIO()
    zf = zipfile.ZipFile(s, "w") 
    zip_subdir = "media"
    qs = self.get_queryset()
    f = self.filter_class(self.request.GET, queryset=qs)
    for obj in f.qs:
        path = obj.image_before.path
        fdir, fname = os.path.split(path)
        zip_path = os.path.join(zip_subdir, path[len(settings.MEDIA_ROOT)+1:]) 
        zf.write(path, zip_path)
    # temp file
    temp = tempfile.NamedTemporaryFile()
    temp.close()
    # generate file to be zip in
    fields_display = [ "category", "rectification_status", "location" ]
    fields_fk = ["inspector", ]
    fields_datetime = ["due_date","created", "updated","completed_time"]
    excludes = [field.name for field in self.model._meta.get_fields() if isinstance(field, models.ManyToOneRel) or field.name.lower()=="id"]
    fields_multiple = ["impact",] 
    gen_csv_file(temp.name, self.model, f.qs, fields_display, fields_fk, fields_datetime, excludes, fields_multiple)
    # write this temp file to zip file with specific name
    zf.write(temp.name, "daily_inspection_export.csv")
    os.remove(temp.name)
    zf.close()
    resp = HttpResponse(s.getvalue(), content_type='application/zip') 
    resp['Content-Disposition'] = 'attachment; filename=%s' % "daily_inspection_export.zip"
    return resp

文件生成函数

def gen_csv_file(model, qs, filename, fields_display, fields_fk, fields_datetime, excludes, fields_multiple=None):
    import csv
    with open(filename, 'wb') as csvfile:
        writer = csv.writer(csvfile, dialect='excel')
        csvfile.write(codecs.BOM_UTF8) 
        row = []
        for field in model._meta.get_fields():
            if field.name in excludes:
                continue
            row.append(field.verbose_name)
        writer.writerow(row)

首先创建一个临时文件在磁盘文件,方法可以用tempfile.NamedTemporaryFile()或者tempfile.TemporaryFile(),差别是前者有文件名后者没有。对于本文例子,这两种方式都能工作,但是因为本身对file.name进行操作了,还是推荐NamedTemporaryFile,不管哪种方式,在close之后都会自动删除。还有一种方式是tempfile.mkdtemp(),调用这种方式必须用os.removedirs手动删除。详细用法参考 https://docs.python.org/2/library/tempfile.html

临时文件生成后,如果程序立即再打开它会报错

[Errno 13] Permission denied: 'c:\users\admini~1\appdata\local\temp\tmph_mdma'

查看官方文件定义

tempfile.NamedTemporaryFile([mode='w+b'[, bufsize=-1[, suffix=''[, prefix='tmp'[, dir=None[, delete=True]]]]]])
This function operates exactly as TemporaryFile() does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the name attribute of the returned file-like object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later). If delete is true (the default), the file is deleted as soon as it is closed. The returned object is always a file-like object whose file attribute is the underlying true file object. This file-like object can be used in a with statement, just like a normal file.

注意到其中的描述,在Window下,如果文件打开了,再次打开是不允许的,所以我们必须关闭这个文件才能重新打开。虽然说临时文件关闭后会自动删除,但是好像并不是立即删除,后面可以主动调用os.remove()函数来删除这个临时文件。

CSV格式

在csv文件操作时碰到两个问题

1.提示我的文件格式不匹配,检测到是SYLK格式

原因是因为我的csv内容是以ID开头的,这个是微软的一个bug,会出现这样的问题,具体见https://www.alunr.com/excel-csv-import-returns-an-sylk-file-format-error/

2.修改上ID之后,文件可以正常打开,但是中文全部乱码

解决方案:在文件头部加入csvfile.write(codecs.BOM_UTF8),具体原因我还没有去研究,但这种方法能工作,不管是生成本地文件还是HttpResponse

HttpResponse方案
response = HttpResponse(content_type='text/csv') 
response['Content-Disposition'] = 'attachment; filename={0}'.format(filename)
response.write(codecs.BOM_UTF8) # add bom header
writer = csv.writer(response)
磁盘文件方案
import csv
with open(filename, 'wb') as csvfile:
    csvfile.write(codecs.BOM_UTF8)

zip-stream

zip-stream有好多版本,这儿使用的是 https://github.com/allanlei/python-zipstream

安装

执行pip安装

pip install zipstream

文件下载

import zipstream
z = zipstream.ZipFile()
z.write('static\\css\\inspection.css')

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

Web Response

基本方法如下

from django.http import StreamingHttpResponse

def zipball(request):
    z = zipstream.ZipFile(mode='w', compression= zipfile.ZIP_DEFLATED)
    z.write('/path/to/file')
    response = StreamingHttpResponse(z, content_type='application/zip')
    response['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

将上面的例子用这个方法实现,基本能够对压缩文件的构成和返回进行了简化。另外,zip文件和临时文件都不能删除,否则写入会有问题。

import zipstream
from django.http import StreamingHttpResponse
import tempfile
from .utils import gen_csv_file
def post(self, *args, **kwargs):
    zf = zipstream.ZipFile(mode='w', compression=zipfile.ZIP_DEFLATED)
    zip_subdir = "media"
    for obj in f.qs:
        path = obj.image_before.path
        fdir, fname = os.path.split(path)
        zip_path = os.path.join(zip_subdir, path[len(settings.MEDIA_ROOT)+1:]) 
        zf.write(path, zip_path)
        # write zip file
        if obj.image_after:
            path = obj.image_after.path
            fdir, fname = os.path.split(path)
            zip_path = os.path.join(zip_subdir, path[len(settings.MEDIA_ROOT)+1:]) 
            zf.write(path, zip_path)
    # create temp file
    temp = tempfile.NamedTemporaryFile()
    temp.close()
    # zip temp file
    fields_display = [ "category", "rectification_status", "location" ]
    fields_fk = ["inspector", ]
    fields_datetime = ["due_date","created", "updated","completed_time"]
    excludes = [field.name for field in self.model._meta.get_fields() if isinstance(field, models.ManyToOneRel)]
    fields_multiple = ["impact",] 
    gen_csv_file(self.model, f.qs, temp.name, fields_display, fields_fk, fields_datetime, excludes, fields_multiple)
    zf.write(temp.name, "daily_inspection_export.csv") 
    # set response
    response = StreamingHttpResponse(zf, content_type='application/zip') 
    response['Content-Disposition'] = 'attachment; filename={}'.format('daily_inspection_export.zip')
    # zf.close()
    # os.remove(temp.name)
    return response

大文件下载

为避免在磁盘和内容存放过多的内容,结合生成器的使用,zipstream提供了迭代的方法去存储文件,官方Demo如下。迭代和非迭代的方式是可以混合使用的。这儿不展开了。

def iterable():
    for _ in xrange(10):
        yield b'this is a byte string\x01\n'

z = zipstream.ZipFile()
z.write_iter('my_archive_iter', iterable())
z.write('path/to/files', 'my_archive_files')

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

评论