Click on the logo at the top to go back to the main menu.

 

OCR (Optical Character Recognition) allows you to scan a document and make the text editable. For instance, you can scan a page of text and convert it into a Microsoft Document or make a PDF searchable. You can also take a pre-existing image file or PDF and run OCR on it for the same results.

 

 

Basic Workflow for OCR in OmniPage TEXT

There are three main processing steps that are basic to most OCR scanning needs:

  1. Import image(s) into OmniPage by scanning them or by loading the file into the program.
  2. Perform OCR to make the text editable. (Use the spell check after this step to correct errors.)
  3. Export the document in a file type you wish and to a location you choose.

You can go through this common workflow automatically, manually, or in a combined automatic and manual fashion.

 

back to top

 

Importing Images for Processing in OmniPage TEXT

Before you get started, first open OmniPage Professional 16 by going to Start > Programs > ScanSoft OmniPage 16 > OmniPage Professional 16.

  1. There are 3 ways you can import images into OmniPage for processing. Choose the best method for your project, depending on the format of the original document or image with text:
    • The first way is to click the Get Pages drop-down list (the arrow next to button #1) and choose Load Files. You can load image files or already existing documents (such as a PDF or JPG) to convert.
      • NOTE: An image must be a minimum of 16 by 16 pixels for processing.
    • The second way is to connect a digital camera with your photos to the computer. Click Load Digital Camera Files in the Get Pages drop-down list (the arrow next to button #1).
    • The third way is the most common method--using the scanner. Click the Get Pages drop-down list (the arrow next to button #1) and select Scanner to use this option.
      • If you are scanning using the computer named Gloria, the Scan Using HP Scanjet 4070 dialog box will pop up. Choose whether to scan in Black and White, Grayscale, or Color. Select other appropriate settings, then click Scan. The images will be brought back into OmniPage when the scanner finishes scanning.
      • If you are scanning using the computer named Ezekiel, Scanwise will open up. Select the appropriate settings, then click Scan. The images will be brought back into OmniPage when the scanner finishes scanning.
      • NOTE: Good brightness and contrast help OCR accuracy. Some images may need to be pre-processed before undergoing OCR. To improve your OCR results you can edit your images within OmniPage. Go to SET > Enhance Image in the Image Toolbar or click Tools > SET > Enhance Image. Use the image enhancement tools to edit your images to prepare them for OCR.
  2. Click the Get Page(s) button (button #1). The images are imported into OmniPage by the method you chose above.
  3. Click Stop Loading Pages once you have imported all your images/pages for OCR.
  4. Once you have all your images imported, you are ready to move on to the next step--performing OCR.

 

back to top

 

Performing OCR on Images in OmniPage TEXT

Once you have imported your images with text by way of one of the three main methods above, you need to perform Optical Character Recognition (OCR) on them to make the text editable.

  1. Set the Perform OCR button (button #2) to one of the following options, depending on the layout of your original document. Selecting the type of layout now helps with zoning. Zoning tells OmniPage which parts of the page to perform OCR on and which parts to ignore, depending on the layout:
    • Automatic (this will do the zoning for you; usually works well, but if you can specify what type of document it is, please choose from the other layout options)
    • Single Column, No Table
    • Multiple Columns, No Tables
    • Single Column with Table
    • Spreadsheet
    • Form
    • Legal Pleading
    • Custom (User Defined)
  2. If your original document contained language(s) other than English or multiple languages, you can set the OCR to recognize a foreign language and special dictionaries. Click the Options button or go to Tools > Options… Under the OCR tab, put a checkmark by every language used in the document and by each special dictionary that may help in the proof-checking process.
    • NOTE: Here, you can also specify the original font used in the document for even greater OCR accuracy, customize the layout description for OCR zoning purposes, or specify characters you want OCR to include or reject.
  3. Click the Perform OCR button (button #2). OmniPage scans the text and converts it to an editable format.
  4. Use the OCR Proofreader to change and edit the text where necessary, especially in areas where the OCR failed to recognize the text correctly.
  5. Once you have performed OCR on all your images, you are ready to move on to the final step--exporting the text into a usable format.

 

back to top

 

Exporting Recognized Text from OmniPage TEXT

After performing OCR on the text, you can export it as a useable format, such as a PDF or Word Document.

Exporting as a Word Document TEXT

  1. Set the Export Results button (button #3) to Save to Files.
  2. Click the Export Results button (button #3). The Save to File dialog box pops up.
  3. Select the Save as: Text radio button.
  4. Under Files of Type, choose a Word Document Format (.DOC for earlier versions of Word or .DOCX for 2007 and later versions of word)
  5. Choose a location to save your Word Document.
  6. Click Save. Now you can open your text in Microsoft Word.
    • NOTE: Although OmniPage generally does a good job at formatting the text close to the original, you may need to do some additional formatting work in Microsoft Word to make the document looking how you want it to look.
back to top

Exporting as a PDF (Portable Document Format) TEXT

  1. Set the Export Results button (button #3) to Save to Files.
  2. Click the Export Results button (button #3). The Save to File dialog box pops up.
  3. Select the Save as: Text radio button.
  4. Under Files of Type, choose a Portable Document Format, or .PDF.
  5. Choose a location to save your PDF.
  6. Click Save. Now you can open your text in any PDF viewer, such as Adobe Acrobat or Apple Preview.
    • NOTE: Sometimes it may be best to save as a Microsoft Word Document first, then from Word, save it as a PDF, especially if you want to make sure the Document is formatted in the correct way first.

back to top

Exporting as Other Formats TEXT

  1. Set the Export Results button (button #3) to Save to Files.
  2. Click the Export Results button (button #3). The Save to File dialog box pops up.
  3. Select the Save as: Text radio button.
  4. Under Files of Type, choose any other text format, including Rich Text Format (.RTF), Plain Text (.TXT), and HTML (.HTML).
  5. Choose a location to save your text file.
  6. Click Save.

 

back to top

 

Run OCR with Adobe Acrobat Professional TEXT

Adobe Acrobat Professional offers a lightweight option to run OCR. This works best with clean documents that are already in PDF format. Use OmniPage Pro 16 if you are starting OCR from scratch.

  1. Open Adobe Acrobat Professional (available on several Digital Studio computers).
  2. Go to File > Open… and browse for the PDF file on which you would like to perform Optical Character Recognition (OCR). Click Open.
  3. To perform OCR, go to Document > OCR Text Recognition > Recognize Text Using OCR…
  4. Select All Pages, Current Page, or a set of pages on which to run OCR.
  5. Click Edit… under Settings and select Primary OCR Language, PDF Output Style, and set DPI for downsampling. Click OK.
  6. Once you are satisfied with the settings, click OK. Wait for OCR to scan the document.
  7. Now your PDF should be searchable.

 

back to top

 

Straightening Crooked Text in a Scanned PDF with Acrobat Professional TEXT

If you have a PDF that was created from scanned documents, but the text that was scanned in is crooked, you can make Adobe Acrobat 8.0 Professional deskew the page. This means that Acrobat will analyze the document and will automatically adjust and align the text so that it looks straightened. Here's how:

  1. Open Acrobat. Go to Document > Optimize Scanned PDF.
  2. Make sure that the Deskew function is on Automatic (notice that there are other settings, which you can also adjust).
  3. Press OK. Depending on the size of the PDF document, it may take some time to optimize it, but it will be worth it. Acrobat automatically straightens out that crooked text.
    • NOTE: This function works best for slightly skewed documents that were a little crooked when they were scanned in. You may not be able to fix severely crooked text, which may require you to do some image editing in Photoshop or a like program first.

 

back to top

 

Scanned Text to Microsoft Word Document VIDEO

 

 

back to top