Version 2.1-stable is out. An update would be great, as the current version doesn't work with the latest version of Tesseract anymore.
Search Criteria
Package Details: ocrmypdf 16.7.0-1
Package Actions
Git Clone URL: | https://aur.archlinux.org/ocrmypdf.git (read-only, click to copy) |
---|---|
Package Base: | ocrmypdf |
Description: | A tool to add an OCR text layer to scanned PDF files, allowing them to be searched |
Upstream URL: | https://github.com/ocrmypdf/OCRmyPDF |
Licenses: | MPL2 |
Submitter: | dreuter |
Maintainer: | fbrennan (pigmonkey) |
Last Packager: | pigmonkey |
Votes: | 126 |
Popularity: | 3.87 |
First Submitted: | 2014-01-27 11:36 (UTC) |
Last Updated: | 2024-12-10 05:10 (UTC) |
Dependencies (21)
- ghostscript
- img2pdf (img2pdf-gitAUR)
- pngquant
- python (python37AUR, python311AUR, python310AUR)
- python-deprecation
- python-importlib_resources
- python-packaging
- python-pdfminer
- python-pikepdf
- python-pillow
- python-pluggy
- python-reportlab
- python-rich
- python-tqdm
- tesseract (tesseract-gitAUR)
- unpaper (unpaper-gitAUR)
- python-build (make)
- python-hatch-vcs (make)
- python-installer (make)
- python-wheel (make)
- jbig2encAUR (jbig2encAUR, jbig2enc-gitAUR) (optional) – Better compression algorithm; results in smaller PDF files
Required by (6)
- docspell-joex (optional)
- dpsprep-git (optional)
- phoronix-test-suite-git (optional)
- python-ocrmypdf-papermerge
- riven-original-soundtrack (make)
- stirling-pdf-bin
Sources (1)
dbrgn commented on 2014-09-23 12:59 (UTC)
sagittarius commented on 2014-09-17 23:43 (UTC)
Since upgrade to python2-reportlab, several issues for making OCRmyPDF to work properly.
I had to:
- downgrade to python2-reportlab v2.7 (PKGBUILD here https://projects.archlinux.org/svntogit/community.git/plain/trunk/PKGBUILD?h=packages/python-reportlab&id=5c04a255e9c0f352dee3282f2a308d375926ed30).
- replace in /usr/lib/ocrmypdf/src/ocrPage.sh:
mv "$curHocr.html" "$curHocr" by mv "$curHocr.hocr" "$curHocr"
- ovewrite /usr/lib/ocrmypdf/src/hocrTransform.py by original file from GitHub: https://github.com/fritz-hh/OCRmyPDF/tree/v2.x/src
I made also a KDE service menu named OCRmyPDF.desktop in: /usr/share/kde4/services/ServiceMenus/
wich contents:
[Desktop Entry]
Type=Service
ServiceTypes=KonqPopupMenu/Plugin
MimeType=application/pdf;
Icon=application-postscript
TryExe=OCRmyPDF.sh
Actions=OCRmyPDFclean;OCRmyPDFnoclean
[Desktop Action OCRmyPDFclean]
Name=OCR -> PDF clean
Icon=application-postcript
Exec=OCRmyPDF.sh -l eng -d -c -i %f "`echo %f | perl -pe 's/\.[^.]+$//'`-ocr.pdf";kdialog --passivepopup "Done" 3; echo
[Desktop Action OCRmyPDFnoclean]
Name=OCR -> PDF noclean
Icon=application-postcript
Exec=OCRmyPDF.sh -l eng -d -c %f "`echo %f | perl -pe 's/\.[^.]+$//'`-ocr.pdf";kdialog --passivepopup "Done" 3; echo
dreuter commented on 2014-09-16 17:31 (UTC)
@Chais: Could you provide a minimal (not) working example?
So just some (sample) files to ocr and the commandline options you used.
Chais commented on 2014-09-16 12:56 (UTC)
When trying to ocr a pdf I'm getting these errors: http://sprunge.us/dKNQ
No iea what to make of it.
dreuter commented on 2014-03-09 17:44 (UTC)
I just fixed it. Thanks.
I also wrote an pull request to change it in the code (but maybe it does not affect the Ubuntu-Users and so it won't be upgraded soon).
p3t3r commented on 2014-03-05 17:39 (UTC)
Thank for this package, it's a great tool for quick and decent OCR.
While it worked at first, it unfortunately stopped soon afterwards. But there's a fix, since it's just a module/function in Python that has since been renamed:
In /usr/lib/ocrmypdf/src/hocrTransform.py replace
_AsciiBase85Encode
with
asciiBase85Encode
and everything is fine again.
Pinned Comments
fbrennan commented on 2023-05-12 22:54 (UTC)
The flag was invalid and has been removed with no action taken as no new version was released. There's nothing to do for this package; no new release has been made. Rebuild, as @eclairevoyant has said.