【Python小技巧】openpyxl读取xlsx文件报错the workbook source files contain some invalid XML的解决办法
问题描述
通过openpyxl读写xlsx文件,发现报错:“This is most probably because the workbook source files contain some invalid XML”,具体如下:
wb = openpyxl.load_workbook(filename,data_only=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ProgramData\Anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 346, in load_workbook
reader.read()
File "D:\ProgramData\Anaconda3\Lib\site-packages\openpyxl\reader\excel.py", line 307, in read
raise ValueError(
ValueError: Unable to read workbook: could not read worksheets from D:\abc.xlsx.
This is most probably because the workbook source files contain some invalid XML.
Please see the exception for more details.
原因分析:
查看excel文件,发现其中有筛选,去除筛选后再次执行程序,一切OK。
一个文件这么去掉筛选没什么,但文件sheet较多,一个一个做有些折腾!
另外,这个问题openpyxl应该也发现了吧。
解决方案:
检查openpyxl版本,执行如下命令pip show openpyxl,版本为3.1.2
(base) C:\Users\Administrator>pip show openpyxl
Name: openpyxl
Version: 3.1.2
Summary: A Python library to read/write Excel 2010 xlsx/xlsm files
Home-page: https://openpyxl.readthedocs.io
Author: See AUTHORS
Author-email: charlie.clark@clark-consulting.eu
License: MIT
Location: d:\ProgramData\anaconda3\Lib\site-packages
Requires: et-xmlfile
Required-by:
执行pip install openpyxl==1000查看可安装的版本清单
(base) C:\Users\Administrator>pip install openpyxl==1000
ERROR: Ignored the following yanked versions: 3.2.0b1
ERROR: Could not find a version that satisfies the requirement openpyxl==1000 (from versions: 1.1.0, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.2.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.6.1, 1.6.2, 1.7.0, 1.8.0, 1.8.1, 1.8.2, 1.8.3, 1.8.4, 1.8.5, 1.8.6, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.2.6, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.3.5, 2.4.0, 2.4.1, 2.4.2, 2.4.4, 2.4.5, 2.4.7, 2.4.8, 2.4.9, 2.4.10, 2.4.11, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.5.4, 2.5.5, 2.5.6, 2.5.7, 2.5.8, 2.5.9, 2.5.10, 2.5.11, 2.5.12, 2.5.14, 2.6.0a1, 2.6.0b1, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.0.6, 3.0.7, 3.0.9, 3.0.10, 3.1.0, 3.1.1, 3.1.2, 3.1.3, 3.1.4, 3.1.5)
ERROR: No matching distribution found for openpyxl==1000
目前最终版本是3.1.5,显然大于现在的3.1.2,果断使用pip install openpyxl -U升级openpyxl。
(base) C:\Users\Administrator>pip install openpyxl -U
Requirement already satisfied: openpyxl in d:\programdata\anaconda3\lib\site-packages (3.1.2)
Collecting openpyxl
Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: et-xmlfile in d:\programdata\anaconda3\lib\site-packages (from openpyxl) (1.1.0)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
---------------------------------------- 250.9/250.9 kB 48.3 kB/s eta 0:00:00
Installing collected packages: openpyxl
Attempting uninstall: openpyxl
Found existing installation: openpyxl 3.1.2
Uninstalling openpyxl-3.1.2:
Successfully uninstalled openpyxl-3.1.2
Successfully installed openpyxl-3.1.5
(base) C:\Users\Administrator>
升级完成后,再次程序,无论xlsx文件是否存在筛选,读取数据都不再报错且数据读取也没问题。至此,问题完美解决!
作者:IT里的交易员