Python-PPTX库深度解析：PPT内容逐层探究与实战技巧详解

思维导图结构

PPTX 文件结构解析（以 `prs.slides[0]` 为例）
├── Presentation (prs)
│   ├── slides → 获取所有幻灯片列表
│   │   └── Slide (每一页幻灯片)
│   │       ├── shapes → 获取幻灯片中的所有形状（文本框、图片、表格等）
│   │       │   └── Shape (每个形状对象)
│   │       │       ├── text_frame → 获取文本框内容（如果存在）
│   │       │       │   └── TextFrame → 包含文本段落和段落格式
│   │       │       │       ├── paragraphs → 文本段落列表
│   │       │       │       │   └── Paragraph → 段落内容、格式
│   │       │       │       │       ├── text → 文本内容
│   │       │       │       │       └── font → 字体属性（颜色、大小等）
│   │       │       ├── table → 获取表格内容（如果存在）
│   │       │       │   └── Table → 表格行、列数据
│   │       │       │       ├── rows → 表格行
│   │       │       │       └── columns → 表格列
│   │       │       ├── image → 获取图片内容（如果存在）
│   │       │       │   └── Image → 图片二进制数据、格式等
│   │       │       └── placeholder → 获取占位符（标题、内容区域等）
│   │       │           └── Placeholder → 占位符名称、类型
│   │       └── notes_slide → 获取备注页内容
│   │           └── TextFrame → 备注文本内容
│   └── slide_layout → 获取幻灯片布局（标题、内容区域定义）
│       └── SlideLayout → 布局名称、占位符类型
│           └── placeholder_formats → 布局中的占位符格式
└── 其他高级对象（如图表、形状集合等）

逐层解析与属性方法说明

1. Presentation 对象

用途：代表整个 PPT 文件。

关键方法/属性：

prs.slides：获取所有幻灯片的列表。

prs.slide_layouts：获取所有可用的幻灯片布局（如“标题和内容”“仅标题”等）。

prs.save()：保存 PPT 文件。

2. Slide 对象（每一页幻灯片）

用途：代表单页幻灯片。

关键方法/属性：

slide.shapes：获取该幻灯片中的所有形状（文本框、图片、表格等）。

slide.placeholders：获取该幻灯片的占位符列表（如标题、内容区域）。

slide.notes_slide：获取该幻灯片的备注页内容。

slide.slide_layout：获取该幻灯片的布局信息。

3. Shape 对象（形状，如文本框、图片等）

用途：代表幻灯片中的单个元素（文本框、图片、表格等）。

关键方法/属性：

文本框：

shape.has_text_frame：判断是否包含文本框。

shape.text_frame：获取文本框对象。

shape.text：直接获取文本内容（快捷方式）。

text_frame.paragraphs：获取文本段落列表。

paragraph.text：获取段落文本。

paragraph.font：获取字体属性（颜色、大小等）。

表格：

shape.has_table：判断是否为表格。

shape.table：获取表格对象。

table.rows：获取表格行。

table.columns：获取表格列。

table.cell(row, col)：获取指定单元格内容。

图片：

shape.has_image：判断是否为图片。

shape.image：获取图片对象。

image.blob：获取图片二进制数据。

image.filename：获取图片文件名（如果存在）。

占位符：

shape.is_placeholder：判断是否为占位符。

shape.placeholder_format：获取占位符类型（如标题、内容）。

placeholder_format.type：占位符类型（如 MSO_PLACEHOLDER_TYPE.TITLE）。

4. TextFrame 对象（文本框内容）

用途：管理文本框的文本内容和格式。

关键方法/属性：

text_frame.clear()：清空文本框内容。

text_frame.paragraphs：获取文本段落列表。

text_frame.word_wrap：是否自动换行。

5. Table 对象（表格内容）

用途：管理表格的行、列和单元格。

关键方法/属性：

table.cell(row, col)：获取指定单元格对象。

cell.text：获取单元格文本。

cell.merge()：合并单元格。

6. Image 对象（图片内容）

用途：管理图片的二进制数据和属性。

关键方法/属性：

image.blob：获取图片的二进制数据。

image.content_type：获取图片格式（如 image/png）。

image.embed：获取图片的嵌入方式（如 EMBED）。

可提取的内容示例

从单页幻灯片中提取内容

from pptx import Presentation

prs = Presentation("your_presentation.pptx")
slide = prs.slides[0]  # 第一页幻灯片

# 提取文本内容
for shape in slide.shapes:
    if shape.has_text_frame:
        for paragraph in shape.text_frame.paragraphs:
            print("文本内容:", paragraph.text)
            print("字体颜色:", paragraph.font.color.rgb)
            print("字体大小:", paragraph.font.size.pt)

# 提取表格内容
for shape in slide.shapes:
    if shape.has_table:
        table = shape.table
        for row in table.rows:
            row_data = []
            for cell in row.cells:
                row_data.append(cell.text)
            print("表格行数据:", row_data)

# 提取图片内容
for shape in slide.shapes:
    if shape.has_image:
        image = shape.image
        print("图片格式:", image.content_type)
        with open("extracted_image.png", "wb") as f:
            f.write(image.blob)  # 保存图片二进制数据

关键操作总结

遍历所有形状：

for shape in slide.shapes:
    if shape.has_text_frame:
        # 处理文本框
    elif shape.has_table:
        # 处理表格
    elif shape.has_image:
        # 处理图片

获取占位符内容：

for placeholder in slide.placeholders:
    if placeholder.placeholder_format.type == 1:  # 1 表示标题
        print("标题:", placeholder.text)
    else:
        print("内容占位符:", placeholder.text)

提取备注页内容：

notes_slide = slide.notes_slide
if notes_slide.has_text_frame:
    print("备注内容:", notes_slide.text_frame.text)

常见错误处理

形状类型判断：

if shape.has_text_frame:
    text = shape.text
else:
    print("该形状不包含文本")

占位符类型匹配：

from pptx.enum.shapes import MSO_PLACEHOLDER
if shape.is_placeholder and shape.placeholder_format.type == MSO_PLACEHOLDER.TITLE:
    print("这是标题占位符:", shape.text)

通过以上层级解析，你可以系统地操作和提取 PPT 中的文本、表格、图片等元素，实现自动化内容生成或数据提取。

作者：东方佑

物联沃分享整理
物联沃-IOTWORD物联网 » Python-PPTX库深度解析：PPT内容逐层探究与实战技巧详解