Click on the logo at the top to go back to the main menu.
OCR (Optical Character Recognition) allows you to scan a document and make the text editable. For instance, you can scan a page of text and convert it into a Microsoft Document or make a PDF searchable. You can also take a pre-existing image file or PDF and run OCR on it for the same results.
Basic Workflow for OCR in OmniPage TEXT
There are three main processing steps that are basic to most OCR scanning needs:
- Import image(s) into OmniPage by scanning them or by loading the file into the program.
- Perform OCR to make the text editable. (Use the spell check after this step to correct errors.)
- Export the document in a file type you wish and to a location you choose.
You can go through this common workflow automatically, manually, or in a combined automatic and manual fashion.
back to top
Importing Images for Processing in OmniPage TEXT
Before you get started, first open OmniPage Professional 16 by going to Start > Programs > ScanSoft OmniPage 16 > OmniPage Professional 16.
- There are 3 ways you can import images into OmniPage for processing. Choose the best method for your project, depending on the format of the original document or image with text:
- The first way is to click the Get Pages drop-down list (the arrow next to button #1) and choose Load Files. You can load image files or already existing documents (such as a PDF or JPG) to convert.
- NOTE: An image must be a minimum of 16 by 16 pixels for processing.
- The second way is to connect a digital camera with your photos to the computer. Click Load Digital Camera Files in the Get Pages drop-down list (the arrow next to button #1).
- The third way is the most common method--using the scanner. Click the Get Pages drop-down list (the arrow next to button #1) and select Scanner to use this option.
- If you are scanning using the computer named Gloria, the Scan Using HP Scanjet 4070 dialog box will pop up. Choose whether to scan in Black and White, Grayscale, or Color. Select other appropriate settings, then click Scan. The images will be brought back into OmniPage when the scanner finishes scanning.
- If you are scanning using the computer named Ezekiel, Scanwise will open up. Select the appropriate settings, then click Scan. The images will be brought back into OmniPage when the scanner finishes scanning.
- NOTE: Good brightness and contrast help OCR accuracy. Some images may need to be pre-processed before undergoing OCR. To improve your OCR results you can edit your images within OmniPage. Go to SET > Enhance Image in the Image Toolbar or click Tools > SET > Enhance Image. Use the image enhancement tools to edit your images to prepare them for OCR.
- Click the Get Page(s) button (button #1). The images are imported into OmniPage by the method you chose above.
- Click Stop Loading Pages once you have imported all your images/pages for OCR.
- Once you have all your images imported, you are ready to move on to the next step--performing OCR.
back to top
back to top
Exporting Recognized Text from OmniPage TEXT
After performing OCR on the text, you can export it as a useable format, such as a PDF or Word Document.
Exporting as a Word Document TEXT
- Set the Export Results button (button #3) to Save to Files.
- Click the Export Results button (button #3). The Save to File dialog box pops up.
- Select the Save as: Text radio button.
- Under Files of Type, choose a Word Document Format (.DOC for earlier versions of Word or .DOCX for 2007 and later versions of word)
- Choose a location to save your Word Document.
- Click Save. Now you can open your text in Microsoft Word.
- NOTE: Although OmniPage generally does a good job at formatting the text close to the original, you may need to do some additional formatting work in Microsoft Word to make the document looking how you want it to look.
back to top
Exporting as a PDF (Portable Document Format) TEXT
- Set the Export Results button (button #3) to Save to Files.
- Click the Export Results button (button #3). The Save to File dialog box pops up.
- Select the Save as: Text radio button.
- Under Files of Type, choose a Portable Document Format, or .PDF.
- Choose a location to save your PDF.
- Click Save. Now you can open your text in any PDF viewer, such as Adobe Acrobat or Apple Preview.
- NOTE: Sometimes it may be best to save as a Microsoft Word Document first, then from Word, save it as a PDF, especially if you want to make sure the Document is formatted in the correct way first.
back to top
Exporting as Other Formats TEXT
- Set the Export Results button (button #3) to Save to Files.
- Click the Export Results button (button #3). The Save to File dialog box pops up.
- Select the Save as: Text radio button.
- Under Files of Type, choose any other text format, including Rich Text Format (.RTF), Plain Text (.TXT), and HTML (.HTML).
- Choose a location to save your text file.
- Click Save.
back to top
Run OCR with Adobe Acrobat Professional TEXT
Adobe Acrobat Professional offers a lightweight option to run OCR. This works best with clean documents that are already in PDF format. Use OmniPage Pro 16 if you are starting OCR from scratch.
- Open Adobe Acrobat Professional (available on several Digital Studio computers).
- Go to File > Open… and browse for the PDF file on which you would like to perform Optical Character Recognition (OCR). Click Open.
- To perform OCR, go to Document > OCR Text Recognition > Recognize Text Using OCR…
- Select All Pages, Current Page, or a set of pages on which to run OCR.
- Click Edit… under Settings and select Primary OCR Language, PDF Output Style, and set DPI for downsampling. Click OK.
- Once you are satisfied with the settings, click OK. Wait for OCR to scan the document.
- Now your PDF should be searchable.
back to top
Straightening Crooked Text in a Scanned PDF with Acrobat Professional TEXT
If you have a PDF that was created from scanned documents, but the text that was scanned in is crooked, you can make Adobe Acrobat 8.0 Professional deskew the page. This means that Acrobat will analyze the document and will automatically adjust and align the text so that it looks straightened. Here's how:
- Open Acrobat. Go to Document > Optimize Scanned PDF.
- Make sure that the Deskew function is on Automatic (notice that there are other settings, which you can also adjust).
- Press OK. Depending on the size of the PDF document, it may take some time to optimize it, but it will be worth it. Acrobat automatically straightens out that crooked text.
- NOTE: This function works best for slightly skewed documents that were a little crooked when they were scanned in. You may not be able to fix severely crooked text, which may require you to do some image editing in Photoshop or a like program first.
back to top
back to top