Which File Format Should I Use for Scanned Documents?

  • This field is for validation purposes and should be left unchanged.

So you’ve decided take your office paperless and are going to scan your documents or outsource the project. That’s great! There are lots of things to consider, but one item on the checklist should be which file format you’re going to use to store your scanned documents.

Let’s cover the most common types of file formats used in document scanning and which one may be best for your company.

As a disclaimer: we know that different file formats are good for different uses. This blog post and our opinions reflect our observation and usage from over 25 years in the industry.

PDF: Portable Document Format

PDF files are one of the most common file formats for scanned documents (and just about any kind of electronic document). Adobe designed the format so metadata, which is descriptive information about the file, could be captured from any application. Best of all, most computer users are familiar with the format and how to use them.

PDF is a great choice for your documents because they have built-in optical character recognition (OCR) technology, which means that they can be text searchable. For example, an instruction manual can be searched for specific text or phrases.

Have a relatively small volume of files? Those can be easily and conveniently stored in a PDF format. In fact, most desktop scanners come with software that will enable you to scan, name, and save a file in PDF format.

When to Use PDF:

  • If you’re going to scan your documents and just store them on your server or a hard drive
  • You only need to scan a low volume of files per day
  • If you plan to print any documents
  • Your company requires storing archived documents as PDF/A (see below)
  • You don’t use document management software
  • You don’t have a large volume of files

Pro tip: if you are scanning your documents just to have as an archive, and you don’t plan to edit them or access them often, you may want to save them in a PDF/A format. The PDF/A was created specifically for archiving. The format includes a “profile,” which means that it also stores data about fonts and colors so that the files can be read for years and years to come, regardless of what kind of software or internet browser is popular in the future.

If you’re an insurance company or benefit fund that needs to retain records for 60+ years, you may want to have both PDF and PDF/A copies for your archives.

PDFs can also be saved as:

  • PDF/E (for engineering and technical documentation archiving)
  • PDF/X (for graphics and printing)

You may also see the PDF options of PDF/UA and PDF/VT. These are standards created for accessibility and variable data printing, respectively.

TIFF: Tagged Image File Format (also seen as TIF)

TIFF files are the other most common file format for scanned documents. These are image files that hold a lot of detail. TIFF files also can compress very well and are commonly used to store scanned business records, patient files, student files, and government records.

If you have a large quantity of files to scan, it’s better to scan in a TIFF format. TIFF files can be efficiently compressed to optimize performance and file size. In addition, files scanned in a single-page TIFF format are compliant with health care privacy guidelines. For example, if you open a 400-page large patient file that is all contained in one PDF file, then you have most likely violated privacy guidelines. Also, the PDF file will be large, take more time to open, and be more susceptible to hackers and other unwanted parties.

As TIFF files can be comprised of single pages, it’s easy to add and subtract pages of documents. For example, if you’re working on an employee’s HR file, it’s easier to add to that file with TIFF images.

When it comes to scanning, one of the reasons that TIFF files are favored over PDFs is because a TIFF file is a true picture of the original document. That true picture preserves the integrity of the document whereas a PDF can be manipulated.

One of the drawbacks of TIFF files is that if you want them to be text searchable, you’ll also have to store a separate text file. However, if you’re working on these files within a document management system, this entire process is pretty seamless.

When to Use TIFF:

  • When you’re working with files that need to be added to frequently
  • If you primarily work with 200+ page documents
  • If you frequently use or plan to implement a document management system
  • When you have a large amount of files to convert to digital
  • If you plan to annotate or edit your documents after scanning them
  • When you’re working with files that must meet certain compliance standards (e.g. HIPAA)

JPEG: Joint Photographic Experts Group (also seen as JPG)

document scanning equipmentJPEGs are one of the most common image file formats for digital photography and web images, but not usually the best choice for your scanned documents. As images are resized or modified, the quality and data can be lost.

However, if you are scanning something like an archive of old posters or something graphics-heavy, JPEGs might be a great format.

Also, if you are using a content management system to hold all of your files and data, JPGs are perfect for storing logos, marketing assets, backups of website images, and printable collateral.

PNG: Portable Network Graphics

PNGs, like JPEGs, are great for web graphics. Like a JPEG, you may want to use this format for marketing materials and website image backups.

However, as they have a lower resolution than both TIFFs and PDFs, we also don’t recommend the format for scanned documents.

GIF, BMP, DOCX, and Other File Formats

We love a good GIF as much as the next person, but with fewer colors and a lower file quality, they aren’t a good file type for your documents.

BMP is an older file format that’s mostly been replaced by TIFF. You should only consider BMP if your ERP is an old, legacy system. And even then, it’d be better to initially scan as TIFF, make a backup, and then convert a duplicate set of TIFF files to BMP.

DOCX, PPTX, and other Microsoft Office files are good for living documents that require collaboration or updates. They’re not recommended for a historical or archived document that you don’t want changed, or files with personal information.

Again, if you’re going to be using an enterprise content management system for your files, you can store any kind of file in there without concern. Older scanned documents, new digital-born documents, pictures, audio files, and so much more can all be stored in one location.

Should I Use PDF or TIFF for My Scanned Documents?

If you’re still on the fence, take a look at our comparison chart:

Ultimately, the file format you use for document imaging isn’t going to make or break your digital transformation. Consider how your digital files will be stored and accessed, along with any kind of business standards your organization has. And remember: TIFF files can be converted into PDF files and vice versa.

What’s most important is being able to retrieve an image as quickly as possible when you need it. And that usually comes down to the kind of system you’re using.

Storing My Files in a Content Management System

laptop document management systemAs we mentioned earlier, if you’re using a true content management system for your files, then you can store everything you want in whatever format you want. There’s no need to convert any files unless your business has certain standards for storing documents.

Now here’s a fun fact: many content and document management systems store images in their native format but render (convert) these images to a different format as they are retrieved and viewed. Generally, the user doesn’t even know that this is happening.

There are multiple reasons for this:

  • By being stored natively, the original file/image is more secure.
  • The viewed/rendered image is now a working copy of the original image. It’s used for a very specialized purpose and safe from hackers and other unwanted sources.
  • Displaying an image in a common format enables the viewer to have more functionality. The viewer window can now easily email, annotate, reorder images, add or delete pages from a file, redact, lock, create a new version, print, integrate to an application, sign, download, perform routing, and carry out workflows.

In conclusion: the format of your files should be considered, but the choice is ultimately which format is best for your company, software, compliance and regulations, and the types and volume of documents to be scanned.

Essentially, don’t worry! A successful digitization project does not come down to whether you scan to PDF or TIFF files.

 

If you have any questions or comments on this article, reach out to us on LinkedIn or Twitter. If you’d like an expert opinion for your scanning project, don’t hesitate to contact us or give us a call at (630) 321-0601.

 

This post is part of the Datamation Guide to Document Scanning.