Line 0
Link Here
|
|
|
1 |
PDFMiner.six is a fork of PDFMiner using six for Python 2 + 3 compatibility. |
2 |
|
3 |
PDFMiner is a tool for extracting information from PDF documents. Unlike other |
4 |
PDF-related tools, it focuses entirely on getting and analyzing text data. |
5 |
|
6 |
PDFMiner allows one to obtain the exact location of text in a page, as well as |
7 |
other information such as fonts or lines. It includes a PDF converter that can |
8 |
transform PDF files into other text formats (such as HTML). It has an extensible |
9 |
PDF parser that can be used for other purposes than text analysis. |
10 |
|
11 |
Features: |
12 |
|
13 |
- Parse, analyze, and convert PDF documents. |
14 |
- PDF-1.7 specification support. (well, almost) |
15 |
- CJK languages and vertical writing scripts support. |
16 |
- Various font types (Type1, TrueType, Type3, and CID) support. |
17 |
- Basic encryption (RC4) support. |
18 |
- Outline (TOC) extraction. |
19 |
- Tagged contents extraction. |
20 |
- Automatic layout analysis. |
21 |
|
22 |
WWW: https://github.com/pdfminer/pdfminer.six |