prune extract model

周伟奇
Showing 3 changed files with 2 additions and 16 deletions
README.md
pdf_to_img.py
requirements.txt
--- a/README.md
View file @94794bd
+++ b/README.md
View file @94794bd
 # PDF转图片脚本
-## 2种转化方式
+## 转化方式
 - 保存整个页面为png图片
- 提取PDF页面中的图片对象
-  - 图片对象数目为0(如电子账单)，保存整个页面为png图片
-  - 图片对象数目为1
-      - 大图，保存图片对象
-      - 小图(如电子账单盖章)，保存整个页面为png图片
-  - 图片对象数目大于1
-      - 多整图，保存图片对象
-      - 多碎图，根据宽高突变位置分组，拼接合并后保存
-  - 其他特殊情况：保存整个页面为png图片
-## 已知问题
- 提取图片对象方式下，整图与碎图通过宽高阈值区分，无法满足所有PDF。个别PDF中，整图很小时会被当做碎图合并，碎图很大时会被当做整图不合并
 ## 用法
 - python3.6+
 - `pip install -r requirements.txt`
- - `python pdf_to_img.py [-h] -i INPUT [-o OUTPUT] [-e]`
+ - `python pdf_to_img.py [-h] -i INPUT [-o OUTPUT]`
    ```
    可选参数:
      -h, --help                  查看帮助信息并退出
      -i INPUT, --input INPUT     PDF文件或目录路径，必要参数
      -o OUTPUT, --output OUTPUT  输出图片保存路径，非必要参数，缺省值为PDF文件路径
-      -e, --extract               默认采用整个页面保存png图片的方式，增加该选项选择提取图片方式转化图片
    ```
\ No newline at end of file
--- a/pdf_to_img.py
View file @94794bd
+++ b/pdf_to_img.py
View file @94794bd
--- a/requirements.txt
View file @94794bd
+++ b/requirements.txt
View file @94794bd
-Pillow==7.2.0
 PyMuPDF==1.17.0
\ No newline at end of file