Package Details: ocrmypdf 16.8.0-1

Git Clone URL: https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy)
Package Base: ocrmypdf
Description: A tool to add an OCR text layer to scanned PDF files, allowing them to be searched
Upstream URL: https://github.com/ocrmypdf/OCRmyPDF
Licenses: MPL2
Submitter: dreuter
Maintainer: fbrennan (pigmonkey)
Last Packager: pigmonkey
Votes: 125
Popularity: 3.28
First Submitted: 2014-01-27 11:36 (UTC)
Last Updated: 2025-01-07 20:27 (UTC)

Pinned Comments

fbrennan commented on 2023-05-12 22:54 (UTC)

The flag was invalid and has been removed with no action taken as no new version was released. There's nothing to do for this package; no new release has been made. Rebuild, as @eclairevoyant has said.

Latest Comments

« First ‹ Previous 1 .. 5 6 7 8 9 10 11 12 13 14 15 .. 22 Next › Last »

jorges commented on 2020-07-29 11:18 (UTC) (edited on 2020-07-29 11:22 (UTC) by jorges)

I was getting the traceback shown below with python-pdfminer. I was able to solve the problem by removing that package and installing python-pdfminer.six. I other people can confirm this maybe the package dependencies have to be changed?

$ ocrmypdf 
Traceback (most recent call last):
  File "/usr/bin/ocrmypdf", line 33, in <module>
    sys.exit(load_entry_point('ocrmypdf==10.3.1', 'console_scripts', 'ocrmypdf')())
  File "/usr/bin/ocrmypdf", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/lib/python3.8/site-packages/ocrmypdf/__init__.py", line 21, in <module>
    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/__init__.py", line 19, in <module>
    from ocrmypdf.pdfinfo.info import Colorspace, Encoding, PdfInfo
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/info.py", line 37, in <module>
    from ocrmypdf.pdfinfo.layout import get_page_analysis, get_text_boxes
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/layout.py", line 29, in <module>
    from pdfminer.pdfdocument import PDFTextExtractionNotAllowed
ImportError: cannot import name 'PDFTextExtractionNotAllowed' from 'pdfminer.pdfdocument' (/usr/lib/python3.8/site-packages/pdfminer/pdfdocument.py)
(ins)[jscandal@lhasa .aur_bb]$ python
Python 3.8.5 (default, Jul 27 2020, 08:42:51) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(ins)>>> from pdfminer.pdfdocument import PDFTextExtractionNotAllowed
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'PDFTextExtractionNotAllowed' from 'pdfminer.pdfdocument' (/usr/lib/python3.8/site-packages/pdfminer/pdfdocument.py)

bsdice commented on 2020-07-23 12:03 (UTC) (edited on 2020-07-23 12:03 (UTC) by bsdice)

Anybody else getting tracebacks when using --threshold?

An exception occurred while executing the pipeline
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(args, *kwds))
  File "/usr/lib/python3.8/site-packages/ocrmypdf/_sync.py", line 195, in exec_page_sync
    ocr_image_out = create_ocr_image(ocr_image, page_context)
  File "/usr/lib/python3.8/site-packages/ocrmypdf/_pipeline.py", line 544, in create_ocr_image
    dpi = tuple(round(coord) for coord in im.info['dpi'])
KeyError: 'dpi'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/ocrmypdf/_sync.py", line 356, in run_pipeline exec_concurrent(context) File "/usr/lib/python3.8/site-packages/ocrmypdf/_sync.py", line 267, in exec_concurrent exec_progress_pool( File "/usr/lib/python3.8/site-packages/ocrmypdf/_concurrent.py", line 108, in exec_progress_pool result = results.next() File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next raise value KeyError: 'dpi'

marlemion commented on 2020-07-16 06:47 (UTC) (edited on 2020-07-16 12:30 (UTC) by marlemion)

Never mind the below. For some reason, some files were missing from my system.

Fully updated arch and updated ocrmypdf to the latest via AUR:

ocrmypdf
Traceback (most recent call last):
  File "/usr/bin/ocrmypdf", line 33, in <module>
    sys.exit(load_entry_point('ocrmypdf==10.2.0', 'console_scripts', 'ocrmypdf')())
  File "/usr/bin/ocrmypdf", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.8/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in   import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in     _call_with_frames_removed
  File "/usr/lib/python3.8/site-packages/ocrmypdf/__init__.py", line 21, in     <module>
    from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/__init__.py", line 19, in <module>
    from ocrmypdf.pdfinfo.info import Colorspace, Encoding, PdfInfo
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/info.py", line 37, in <module>
    from ocrmypdf.pdfinfo.layout import get_page_analysis, get_text_boxes
  File "/usr/lib/python3.8/site-packages/ocrmypdf/pdfinfo/layout.py", line 24, in <module>
    import pdfminer.encodingdb
ModuleNotFoundError: No module named 'pdfminer.encodingdb'

Packages:

pakku -Ss pdfminer
community/python-pdfminer 20200517-1 [installed]
    Python PDF Parser
aur/pdfminer 20191125-1 [20 / 0.157511]
    python3 utils to extract, analyze text data of PDF files. Includes pdf2txt, dumppdf, and latin2ascii
aur/pdfminer-git r480.14fd0fd-1 [3 / 0.000000]
    python utils to extract& analyze text data of PDF files.
aur/pdfminer3k 1.3.1-1 [0 / 0.000000]
    A python3 port of pdfminer
aur/python-pdfminer.six 20200124-1 [6 / 0.013772]
    PDF parser and analyzer for Python

What is the problem?

xuanruiqi commented on 2020-07-03 02:28 (UTC)

Now that python-pillow in community has been updated to 7.2.0, the block on updating this should be no longer existent.

pigmonkey commented on 2020-06-14 18:58 (UTC)

I pinged the python-pillow packager. The package had simply fallen through the cracks and he will be updating it today, but 7.0 introduced some API breakage so the upgraded package will probably hang out in the testing repo for a bit.

fbrennan commented on 2020-06-13 01:03 (UTC)

It might make sense to put it an orphan request for python-pillow-git, then update that, then temporarily require it, @pigmonkey, given how long the community package has been out of date. Though, it's of course up to you, as it might be too much work.

jbarlow commented on 2020-06-13 00:41 (UTC)

Upstream here. I noticed python-pillow in AUR is quite old so this could be a blocker for some time.

ocrmypdf does work with pillow 6.2.1, with all tests passing. You could override the requirement and permit the earlier pillow. (I'd rather not change this upstream, so that upstream reflects the configuration that is being tested.)

On another note, I strongly doubt that pillow-simd would yield any measurable change in performance so it would not be worth the effort to integrate this.

pigmonkey commented on 2020-06-12 21:45 (UTC)

This package is stuck on 9.8.2 until the community python-pillow package is upgraded to >=7.0.0.

pigmonkey commented on 2020-05-28 22:17 (UTC)

Thanks for identifying the issue. It looks like v9.8.1 fixes this and is in the process of being pushed to pypi.

brianmercer commented on 2020-05-28 21:18 (UTC)

Temporary workaround is to roll back python-pdfminer to the prior version:

pacman -U /var/cache/pacman/pkg/python-pdfminer-20200402-1-any.pkg.tar.zst

and optionally add

IgnorePkg = python-pdfminer

to the /etc/pacman.conf file to keep it from upgrading for now.