Skip to content
Toggle navigation
Toggle navigation
This project
Loading...
Sign in
周伟奇
/
pdf_to_img
Go to a project
Toggle navigation
Toggle navigation pinning
Projects
Groups
Snippets
Help
Project
Activity
Repository
Pipelines
Graphs
Issues
0
Merge Requests
0
Wiki
Network
Create a new issue
Builds
Commits
Issue Boards
Files
Commits
Network
Compare
Branches
Tags
94794bd5
authored
2020-08-06 15:21:27 +0800
by
周伟奇
Browse Files
Options
Browse Files
Tag
Download
Email Patches
Plain Diff
prune extract model
1 parent
ff70b617
Expand all
Show whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
2 additions
and
16 deletions
README.md
pdf_to_img.py
requirements.txt
README.md
View file @
94794bd
# PDF转图片脚本
##
2种
转化方式
## 转化方式
-
保存整个页面为png图片
-
提取PDF页面中的图片对象
-
图片对象数目为0(如电子账单),保存整个页面为png图片
-
图片对象数目为1
-
大图,保存图片对象
-
小图(如电子账单盖章),保存整个页面为png图片
-
图片对象数目大于1
-
多整图,保存图片对象
-
多碎图,根据宽高突变位置分组,拼接合并后保存
-
其他特殊情况:保存整个页面为png图片
## 已知问题
-
提取图片对象方式下,整图与碎图通过宽高阈值区分,无法满足所有PDF。个别PDF中,整图很小时会被当做碎图合并,碎图很大时会被当做整图不合并
## 用法
-
python3.6+
-
`pip install -r requirements.txt`
-
`python pdf_to_img.py [-h] -i INPUT [-o OUTPUT]
[-e]
`
-
`python pdf_to_img.py [-h] -i INPUT [-o OUTPUT]`
```
可选参数:
-h, --help 查看帮助信息并退出
-i INPUT, --input INPUT PDF文件或目录路径,必要参数
-o OUTPUT, --output OUTPUT 输出图片保存路径,非必要参数,缺省值为PDF文件路径
-e, --extract 默认采用整个页面保存png图片的方式,增加该选项选择提取图片方式转化图片
```
\ No newline at end of file
...
...
pdf_to_img.py
View file @
94794bd
This diff is collapsed.
Click to expand it.
requirements.txt
View file @
94794bd
Pillow==7.2.0
PyMuPDF==1.17.0
\ No newline at end of file
...
...
Write
Preview
Styling with
Markdown
is supported
Attach a file
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to post a comment