您的位置:首页技术文章
文章详情页

Python requests上传文件实现步骤

【字号: 日期:2022-07-11 10:53:26浏览:2作者:猪猪

官方文档:https://2.python-requests.org//en/master/

工作中涉及到一个功能,需要上传附件到一个接口,接口参数如下:

使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,

字段列表:md5: //md5加密(随机值_当时时间戳)filesize: //文件大小file: //文件内容(须含文件名)返回值:{'success':true,'uploadName':'tmp.xml','uploadPath':'uploads/201311/758e875fb7c7a508feef6b5036119b9f'}

由于工作中主要用python,并且项目中已有使用requests库的地方,所以计划使用requests来实现,本来以为是很简单的一个小功能,结果花费了大量的时间,requests官方的例子只提到了上传文件,并不需要传额外的参数:

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-file

>>> url = ’https://httpbin.org/post’>>> files = {’file’: (’report.xls’, open(’report.xls’, ’rb’), ’application/vnd.ms-excel’, {’Expires’: ’0’})}>>> r = requests.post(url, files=files)>>> r.text{ ... 'files': { 'file': '<censored...binary...data>' }, ...}

但是如果涉及到了参数的传递时,其实就要用到requests的两个参数:data、files,将要上传的文件传入files,将其他参数传入data,request库会将两者合并到一起做一个multi part,然后发送给服务器。

最终实现的代码是这样的:

with open(file_name) as f:content = f.read()request_data = { ’md5’:md5.md5(’%d_%d’ % (0, int(time.time()))).hexdigest(), ’filesize’:len(content),}files = {’file’:(file_name, open(file_name, ’rb’))}MyLogger().getlogger().info(’url:%s’ % (request_url))resp = requests.post(request_url, data=request_data, files=files)

虽然最终代码可能看起来很简单,但是其实我费了好大功夫才确认这样是OK的,中间还翻了requests的源码,下面记录一下翻阅源码的过程:

首先,找到post方法的实现,在requests.api.py中:

def post(url, data=None, json=None, **kwargs): r'''Sends a POST request. :param url: URL for the new :class:`Request` object. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json data to send in the body of the :class:`Request`. :param **kwargs: Optional arguments that ``request`` takes. :return: :class:`Response <Response>` object :rtype: requests.Response ''' return request(’post’, url, data=data, json=json, **kwargs)

这里可以看到它调用了request方法,咱们继续跟进request方法,在requests.api.py中:

def request(method, url, **kwargs): '''Constructs and sends a :class:`Request <Request>`. :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary, list of tuples or bytes to send in the query string for the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``’name’: file-like-objects`` (or ``{’name’: file-tuple}``) for multipart encoding upload. ``file-tuple`` can be a 2-tuple ``(’filename’, fileobj)``, 3-tuple ``(’filename’, fileobj, ’content_type’)`` or a 4-tuple ``(’filename’, fileobj, ’content_type’, custom_headers)``, where ``’content-type’`` is a string defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers to add for the file. :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How many seconds to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param stream: (optional) if ``False``, the response content will be immediately downloaded. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, (’cert’, ’key’) pair. :return: :class:`Response <Response>` object :rtype: requests.Response Usage:: >>> import requests >>> req = requests.request(’GET’, ’https://httpbin.org/get’) <Response [200]> ''' # By using the ’with’ statement we are sure the session is closed, thus we # avoid leaving sockets open which can trigger a ResourceWarning in some # cases, and look like a memory leak in others. with sessions.Session() as session: return session.request(method=method, url=url, **kwargs)

这个方法的注释比较多,从注释里其实已经可以看到files参数使用传送文件,但是还是无法知道当需要同时传递参数和文件时该如何处理,继续跟进session.request方法,在requests.session.py中:

def request(self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None): '''Constructs a :class:`Request <Request>`, prepares it and sends it. Returns :class:`Response <Response>` object. :param method: method for the new :class:`Request` object. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`. :param data: (optional) Dictionary, list of tuples, bytes, or file-like object to send in the body of the :class:`Request`. :param json: (optional) json to send in the body of the :class:`Request`. :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. :param files: (optional) Dictionary of ``’filename’: file-like-objects`` for multipart encoding upload. :param auth: (optional) Auth tuple or callable to enable Basic/Digest/Custom HTTP Auth. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple :param allow_redirects: (optional) Set to True by default. :type allow_redirects: bool :param proxies: (optional) Dictionary mapping protocol or protocol and hostname to the URL of the proxy. :param stream: (optional) whether to immediately download the response content. Defaults to ``False``. :param verify: (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use. Defaults to ``True``. :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, (’cert’, ’key’) pair. :rtype: requests.Response ''' # Create the Request. req = Request( method=method.upper(), url=url, headers=headers, files=files, data=data or {}, json=json, params=params or {}, auth=auth, cookies=cookies, hooks=hooks, ) prep = self.prepare_request(req) proxies = proxies or {} settings = self.merge_environment_settings( prep.url, proxies, stream, verify, cert ) # Send the request. send_kwargs = { ’timeout’: timeout, ’allow_redirects’: allow_redirects, } send_kwargs.update(settings) resp = self.send(prep, **send_kwargs) return resp

先大概看一下这个方法,先是准备request,最后一步是调用send,推测应该是发送请求了,所以我们需要跟进到prepare_request方法中,在requests.session.py中:

def prepare_request(self, request): '''Constructs a :class:`PreparedRequest <PreparedRequest>` for transmission and returns it. The :class:`PreparedRequest` has settings merged from the :class:`Request <Request>` instance and those of the :class:`Session`. :param request: :class:`Request` instance to prepare with this session’s settings. :rtype: requests.PreparedRequest ''' cookies = request.cookies or {} # Bootstrap CookieJar. if not isinstance(cookies, cookielib.CookieJar): cookies = cookiejar_from_dict(cookies) # Merge with session cookies merged_cookies = merge_cookies( merge_cookies(RequestsCookieJar(), self.cookies), cookies) # Set environment’s basic authentication if not explicitly set. auth = request.auth if self.trust_env and not auth and not self.auth: auth = get_netrc_auth(request.url) p = PreparedRequest() p.prepare( method=request.method.upper(), url=request.url, files=request.files, data=request.data, json=request.json, headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict), params=merge_setting(request.params, self.params), auth=merge_setting(auth, self.auth), cookies=merged_cookies, hooks=merge_hooks(request.hooks, self.hooks), ) return p

在prepare_request中,生成了一个PreparedRequest对象,并调用其prepare方法,跟进到prepare方法中,在requests.models.py中:

def prepare(self, method=None, url=None, headers=None, files=None, data=None, params=None, auth=None, cookies=None, hooks=None, json=None): '''Prepares the entire request with the given parameters.''' self.prepare_method(method) self.prepare_url(url, params) self.prepare_headers(headers) self.prepare_cookies(cookies) self.prepare_body(data, files, json) self.prepare_auth(auth, url) # Note that prepare_auth must be last to enable authentication schemes # such as OAuth to work on a fully prepared request. # This MUST go after prepare_auth. Authenticators could add a hook self.prepare_hooks(hooks)

这里调用许多prepare_xx方法,这里我们只关心处理了data、files、json的方法,跟进到prepare_body中,在requests.models.py中:

def prepare_body(self, data, files, json=None): '''Prepares the given HTTP body data.''' # Check if file, fo, generator, iterator. # If not, run through normal process. # Nottin’ on you. body = None content_type = None if not data and json is not None: # urllib3 requires a bytes-like body. Python 2’s json.dumps # provides this natively, but Python 3 gives a Unicode string. content_type = ’application/json’ body = complexjson.dumps(json) if not isinstance(body, bytes):body = body.encode(’utf-8’) is_stream = all([ hasattr(data, ’__iter__’), not isinstance(data, (basestring, list, tuple, Mapping)) ]) try: length = super_len(data) except (TypeError, AttributeError, UnsupportedOperation): length = None if is_stream: body = data if getattr(body, ’tell’, None) is not None:# Record the current file position before reading.# This will allow us to rewind a file in the event# of a redirect.try: self._body_position = body.tell()except (IOError, OSError): # This differentiates from None, allowing us to catch # a failed `tell()` later when trying to rewind the body self._body_position = object() if files:raise NotImplementedError(’Streamed bodies and files are mutually exclusive.’) if length:self.headers[’Content-Length’] = builtin_str(length) else:self.headers[’Transfer-Encoding’] = ’chunked’ else: # Multi-part file uploads. if files:(body, content_type) = self._encode_files(files, data) else:if data: body = self._encode_params(data) if isinstance(data, basestring) or hasattr(data, ’read’): content_type = None else: content_type = ’application/x-www-form-urlencoded’ self.prepare_content_length(body) # Add content-type if it wasn’t explicitly provided. if content_type and (’content-type’ not in self.headers):self.headers[’Content-Type’] = content_type self.body = body

这个函数比较长,需要重点关注L52,这里调用了_encode_files方法,我们跟进这个方法:

def _encode_files(files, data): '''Build the body for a multipart/form-data request. Will successfully encode files when passed as a dict or a list of tuples. Order is retained if data is a list of tuples but arbitrary if parameters are supplied as a dict. The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype) or 4-tuples (filename, fileobj, contentype, custom_headers). ''' if (not files): raise ValueError('Files must be provided.') elif isinstance(data, basestring): raise ValueError('Data must not be a string.') new_fields = [] fields = to_key_val_list(data or {}) files = to_key_val_list(files or {}) for field, val in fields: if isinstance(val, basestring) or not hasattr(val, ’__iter__’):val = [val] for v in val:if v is not None: # Don’t call str() on bytestrings: in Py3 it all goes wrong. if not isinstance(v, bytes): v = str(v) new_fields.append( (field.decode(’utf-8’) if isinstance(field, bytes) else field, v.encode(’utf-8’) if isinstance(v, str) else v)) for (k, v) in files: # support for explicit filename ft = None fh = None if isinstance(v, (tuple, list)):if len(v) == 2: fn, fp = velif len(v) == 3: fn, fp, ft = velse: fn, fp, ft, fh = v else:fn = guess_filename(v) or kfp = v if isinstance(fp, (str, bytes, bytearray)):fdata = fp elif hasattr(fp, ’read’):fdata = fp.read() elif fp is None:continue else:fdata = fp rf = RequestField(name=k, data=fdata, filename=fn, headers=fh) rf.make_multipart(content_type=ft) new_fields.append(rf) body, content_type = encode_multipart_formdata(new_fields) return body, content_type

OK,到此为止,仔细阅读完这个段代码,就可以搞明白requests.post方法传入的data、files两个参数的作用了,其实requests在这里把它俩合并在一起了,作为post的body。

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持好吧啦网。

标签: Python 编程
相关文章: