Tag: PDF

How to convert JPG to PDF

The article “How to convert PDF to JPG using command line in Linux” shows how to split a PDF file into separate pages while converting them to images.

But what if you want to do the opposite? How to assemble JPG images into a PDF file? This article is devoted to this, which will tell you how to create a single PDF document from JPG files.

The convert utility from the ImageMagick package does a great job of combining images (JPG and other formats) into PDF.

For details on installing ImageMagick, including which dependencies need to be installed to support the maximum number of formats, see the article: ImageMagick guide: installing, using, and troubleshooting.

To convert a single image to PDF, run the following command:

convert PICTURE.jpg RESULT.pdf

Example:

convert PL48536179-5.jpg out.pdf

You can specify multiple input .jpg files at once, for example:

convert PL48536179-5.jpg PL48536179-6.jpg PL48536179-7.jpg out.pdf

They will be added one by one to the generated PDF file.

If there are many files and they have a common prefix, then you can use the wildcard character * to add several files at once:

convert PL48536179* out.pdf

Or like this:

convert PL*.jpg out.pdf

The following command will create a PDF file from all JPG files in the current directory:

convert *.jpg out.pdf

By default, a PDF file is created in the highest quality. If you want to reduce the size of the output file, then specify the -quality option with a value less than 100, for example:

convert -quality 70 PL*.jpg out2.pdf

As you can see, the size of the PDF file has indeed decreased:

Online JPG to PDF conversion service

If you are a Windows user, or you do not want to install new utilities and deal with the command line to convert JPG to PDF, then you can collect JPG files into one PDF document on the page of the Online service for converting JPG to PDF: https://suip.biz/?act=convert-jpg-to-pdf

Brief instructions for use are given there, the main point is that if there are several files, then they must be placed in a ZIP archive before uploading.

How to convert PDF to JPG using command line in Linux (SOLVED)

PDF files are not very easy to split into image files in most programs that are used to open these files. However, there are several command line utilities for this. This article will show you how to convert PDF to JPEG on the Linux command line.

ImageMagick (convert)

To convert PDF to individual image files, let's start with the ImageMagick utility.

For details on installing ImageMagick, including which dependencies need to be installed to support the maximum number of formats, see the article: ImageMagick guide: installing, using, and troubleshooting.

Use convert like this:

convert input.pdf output.jpg

For good quality use these options

convert -density 300 -quality 100 in.pdf out.jpg

If you encounter errors, then the following articles may help you:

pdftoppm (from the poppler package)

On Debian, Linux Mint, Ubuntu, Kali Linux and their derivatives, you can install this package with this command:

sudo apt install poppler-utils

On Arch Linux, Manjaro and their derivatives, to install, run the command:

sudo pacman -S poppler

The command format is the following:

pdftoppm -jpeg -r 300 input.pdf output

In this command:

  • -jpeg sets the output image format to JPG,
  • -r 300 sets output image resolution to 300 dpi,
  • output will be the prefix for all image pages that will be numbered and placed in your current directory you are working with.

However, in my opinion, the best way is to first use “mkdir -p images” to create the “images” directory, and then set the output to images/pg so that all output images with a pg file prefix in front of each of their numbers are neatly placed in just that created directory images.

So here are my favorite commands:

We create files with a size of ~1 MB per page. Output in .jpg format at 300 dpi:

mkdir -p images
pdftoppm -jpeg -r 300 mypdf.pdf images/pg

We create files with a size of ~2 MB per page. Output in .jpg format at maximum quality (least compression) and still at 300 dpi:

mkdir -p images
pdftoppm -jpeg -jpegopt quality=100 -r 300 mypdf.pdf images/pg

If you need more resolution, you can try 600 DPI:

mkdir -p images
pdftoppm -jpeg -r 600 mypdf.pdf images/pg

… or 1200 dpi:

mkdir -p images
pdftoppm -jpeg -r 1200 mypdf.pdf images/pg

To create a single file, run:

pdftoppm -singlefile -jpeg -r 300 input.pdf output

vips (from libvips package)

On Debian, Linux Mint, Ubuntu, Kali Linux and their derivatives, you can install this package with this command:

sudo apt install libvips-tools

On Arch Linux, Manjaro and their derivatives, to install, run the command:

sudo pacman -S libvips

libvips can quickly convert PDF → JPEG. This program is present in the standard repositories of most Linux distributions, for macos you can use homebrew, and the Windows binary can be downloaded from the libvips site.

This command will convert PDF to JPG with default resolution (72):

vips copy somefile.pdf somefile.jpg

You can use the dpi option to set a different rendering resolution, like so:

vips copy somefile.pdf[dpi=600] somefile.jpg

You can select specific pages:

vips copy somefile.pdf[dpi=600,page=12] somefile.jpg

Or render five pages, starting with the third, like this:

vips copy somefile.pdf[dpi=600,page=3,n=5] somefile.jpg

The documentation for pdfload has all the options.

Comparison of program speeds:

time -f %M:%e convert -density 300 r8.pdf[3] x.jpg
276220:2.17

time -f %M:%e pdftoppm -jpeg -r 300 -f 3 -l 3 r8.pdf x.jpg
91160:1.24

time -f %M:%e vips copy r8.pdf[page=3,dpi=300] x.jpg
149572:0.53

So libvips is about 4 times faster and requires half the memory, at least in this test.

Online service to convert PDF to JPG

If you are a Windows user, or you do not want to install new utilities and deal with the command line to convert PDF to JPG, then you can split PDF files into separate images on the page of the Online service for converting PDF to JPG: https://suip.biz/?act=convert-pdf-to-jpg

This online service supports both single-page and multi-page PDF files. In the case of converting a multi-page PDF document, the files with page pictures will be placed in an archive for convenience, which can be downloaded at a time, regardless of the number of JPG files.

See also: How to convert JPG to PDF

Error “convert: cache resources exhausted” (SOLVED)

When using the convert utility to convert images, you may encounter an error stating that the cache resources have been exhausted.

Command example:

convert -density 300 -quality 100 input.pdf output.png

An example of the error it causes:

convert-im6.q16: cache resources exhausted `/tmp/magick-q7O_IcbbGpFULs5R34rLlwAyeW1slGHi19' @ error/cache.c/OpenPixelCache/4095.

This error occurs when two conditions are combined:

  • processing a large file (for example, when converting PDF to JPG)
  • weak computer with little RAM

As a quick fix, you can try reducing the image quality:

convert -density 150 -quality 70 input.pdf output.png

The -density option specifies the horizontal and vertical density of the image, that is, the number of dots. Typically, for scanning photographs, the setting is set to 300 dots – this is enough for good image quality for viewing on a monitor screen and for printing. You can experiment by choosing a lower value.

The -quality option specifies the compression level for JPEG/MIFF/PNG. Option value 100 means no compression, 100% image quality. An option value of 70 means 70% of the image quality by reducing its size.

If you do not want to reduce the quality, then you can try changing the settings. To do this, open the policy.xml file. Depending on your Linux distribution and version of ImageMagick, the path to the file may vary, for example:

  • /etc/ImageMagick-6/policy.xml
  • /etc/ImageMagick-7/policy.xml

Find and edit the string value:

<policy domain="resource" name="memory" value="256MiB"/>

Error “attempt to perform an operation not allowed by the security policy `PDF’” (SOLVED)

On Debian, Ubuntu, Linux Mint, Arch Linux, and derivative systems, when converting documents from PDF to PNG, an error occurs:

attempt to perform an operation not allowed by the security policy `PDF'

An example of a command that causes this error:

convert -density 300 -quality 100 PL48536179.pdf out.jpg
convert-im6.q16: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/421.
convert-im6.q16: no images defined `out.jpg' @ error/convert.c/ConvertImageCommand/3229.

Apparently Imagemagick's security policy does not allow this conversion from pdf to png. Converting other formats seems to work, but not from pdf. This happens with the default imagemagick settings.

Two options for solving the problem:

1.

In the file /etc/ImageMagick-6/policy.xml before the line

</policymap>

insert the line:

  <policy domain="coder" rights="read | write" pattern="PDF" />

and everything will work.

Note: The path to the policy.xml file may differ depending on the version of ImageMagick, for example, the path may be: /etc/ImageMagick-7/policy.xml

2.

The second option is similar, you also need to open the file /etc/ImageMagick-6/policy.xml

Find uncommented lines there

  <policy domain="coder" rights="none" pattern="PS" />
  <policy domain="coder" rights="none" pattern="PS2" />
  <policy domain="coder" rights="none" pattern="PS3" />
  <policy domain="coder" rights="none" pattern="EPS" />
  <policy domain="coder" rights="none" pattern="PDF" />
  <policy domain="coder" rights="none" pattern="XPS" />

And comment them out, that is, put <!-- in front of them, and --> at the end.

This should work for Debian, Ubuntu, Linux Mint, and derivative systems.

In Arch Linux open the file /etc/ImageMagick-7/policy.xml

Find the uncommented line there

<policy domain="coder" rights="none" pattern="{PS,PS2,PS3,EPS,PDF,XPS}" />

And comment them out, that is, put <!-- in front of them, and --> at the end.

After that pdf conversion should work again.

No need to do two options at once – choose one of them. The first option will only allow conversion from PDF, the second option will allow conversion from all PS, PS2, PS3, EPS, PDF, XPS formats.

This PDF policy was added due to a bug in Ghostscript that has now been fixed. That is, if you are using the current version of Ghostscript, then this policy is no longer needed.

That is, don't forget to update your Ghostscript to the latest version!

What program to open .docbook files (DocBook)

In fact, .docbook files are not meant to be read - they are meant to be converted to HTML, man, PDF, DOCX, ODT, EPUB, FictionBook2, and many others.

What is DocBook?

DocBook is an XML-based standard used in many modern documentation tasks. When you want to create a DocBook document source, you write XML files that describe the document's layout, paragraphing, and other attributes. The structure of the XML file may sound familiar to you if you've seen HTML before. XML tends to be an improvement over the old HTML specification and can be used to create complete web pages and other markup documents.

What are the benefits of DocBook?

DocBook is the OASIS standard and the format in which most open source projects store their documentation. Docbook is developed as an open source application. The project is hosted on SourceForge and is available under the GPL license. DocBook is available as Document Type Definition (DTD) and XML Schema (XSD). The project has a large developer and support community, spanning both open source and commercial groups.

The most important reasons why DocBook is used in projects:

  • DocBook is the standard
  • DocBook open source
  • DocBook is used in most large projects
  • DocBook has a large developer and support community.

DocBook is also an XML application, and XML technologies solve a number of publishing problems for documentation teams, including:

  • Single source
  • Joint development
  • Cross-platform editing
  • Multichannel publishing
  • Improving the quality and consistency of information
  • Expanding the functionality of electronic withdrawal
  • Opt out of vendor binding

If you are already familiar with XML, then you can start learning DocBook. If you don't understand XML, the good news is that learning DocBook will help you learn XML. Below are two must-read books for anyone new to DocBook.

DocBook - The Definitive Guide http://www.docbook.org/tdg/en/html/docbook.html

DocBook XSL - The Complete Guide http://www.sagehill.net/docbookxsl/index.html

But I think you came to this article first of all in order to open a DocBook file. Well, or at least convert.

DocBook in Writer (LibreOffice)

If you just need to quickly open the .docbook file, then you don't even need to install additional programs if you already have LibreOffice. Writer can open .docbook files. But I strongly discourage this option, because for some reason Writer skips completely random pieces of text without indicating it in any way. In the opened file, about 10% may be missing! Use this only as a last resort.

How to convert DocBook to PDF

Next, we will consider a universal program that can convert to many formats, including PDF. But if you just need PDF, then I highly recommend dblatex (DocBook to LaTeX Publishing), which can convert DocBook (XML and SGML) to DVI, PDF, PostScript using latex. The thing is, dblatex makes the most beautiful PDFs! These PDFs include a table of contents and other elements that are simply missing in the universal program.

Install the dblatex package.

Then just run a command like:

dblatex FILE.docbook

FILE.pdf will be created in the same folder.

Pandoc - universal document converter

If you need to convert files from one markup format to another, pandoc is your Swiss army knife. Pandoc can convert between numerous formats, a full list of them can be found on the program's website: https://pandoc.org/

Install the pandoc package.

Then run one of the following commands:

pandoc --from docbook --to docx --output myDocbook.docx myDocbook.xml

pandoc --from docbook --to odt --output myDocbook.odt myDocbook.xml

pandoc --from docbook --to latex --output myDocbook.pdf myDocbook.xml

pandoc --from docbook --to epub3 --output myDocbook.epub myDocbook.xml

pandoc --from docbook --to markdown --output myDocbook.md myDocbook.xml

pandoc --from docbook --to html --output myDocbook.html myDocbook.xml

As you might have already figured out:

  • --from is the original format
  • --to - the format to convert to
  • --output is the filename for the new format
  • myDocbook.xml - The original DocBook document.

Everything is very simple. But for PDF I recommend dblatex!

Other programs for converting DocBook

There are a few more applications that you may find useful if you want to convert a DocBook file to a different format.

  • kdoctools - Generate documentation from docbook.
  • poxml - Translates DocBook XML files using gettext po files
  • docbook2x is a software package that converts DocBook documents to the traditional Unix manual page format and the GNU Texinfo format.
  • docbook-to-man - batch converter from DocBook SGML to nroff/troff macros man.
Loading...
X