Skip to Main Content
Digital Business Automation Ideas


This is an IBM Automation portal for Digital Business Automation products. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:

Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,

Post your ideas
  1. Post an idea.

  2. Get feedback from the IBM team and other customers to refine your idea.

  3. Follow the idea through the IBM Ideas process.


Please use the following category to raise ideas for these offerings for all environments (traditional on premises, containers, on cloud):
  • Cloud Pak for Business Automation - including Business Automation Studio and App Designer, Business Automation Insights

  • Business Automation Workflow (BAW) - including BAW, Business Process Manager, Workstream Services, Business Performance Center, Advanced Case Management

  • Content Services - FileNet Content Manager

  • Content Services - Content Manager OnDemand

  • Content Services - Daeja Virtual Viewer

  • Content Services - Navigator

  • Content Services - Content Collector for Email, Sharepoint, Files

  • Content Services - Content Collector for SAP

  • Content Services - Enterprise Records

  • Content Services - Content Manager (CM8)

  • Datacap

  • Automation Document Processing

  • Automation Decision Services (ADS)

  • Operational Decision Manager

  • Robotic Process Automation

  • Robotic Process Automation with Automation Anywhere

  • Blueworks Live

  • Business Automation Manager Open Edition

  • IBM Process Mining


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.


Status Under review
Workspace Datacap
Created by Guest
Created on Aug 15, 2024

Request for PDF to Word Document Conversion Feature in Datacap

We would like to request the addition of an action in Datacap that enables the conversion of input PDF files to Word Documents.

Idea priority High
  • Guest
    Reply
    |
    Sep 10, 2024

    Dear All,

    I followed below procedure to this task, but it was lengthy process and not working for images inside PDF file, basically it wont convert.

    Steps to Follow:

    To convert a PDF to text in IBM Datacap, you would typically use a combination of rules and actions in Datacap Studio. Below is a basic example of how you can achieve this using Datacap with an OCR engine like ABBYY or Nuance.

    Step-by-Step Code for Converting PDF to Text:

    1. Create a New Rule Set in Datacap Studio:

      • Open Datacap Studio and create a new Rule Set, which will contain the steps for PDF to text conversion.

    2. Add Actions to the Rule Set:

      • Use the following actions in sequence to perform OCR on a PDF and extract the text:

    Example Rule Set Actions

    xmlCopy code<RuleSet name="PDF_To_Text">
    <Rule name="Load_PDF">
    <!-- Load the PDF file into Datacap -->
    <Action name="LoadPDFFile">
    <Param name="FilePath">D:\Input\sample.pdf</Param> <!-- Path to your PDF file -->
    </Action>
    </Rule>

    <Rule name="Convert_PDF_To_Image">
    <!-- Convert the PDF pages to images (for OCR processing) -->
    <Action name="ConvertPDFToImage">
    <Param name="Resolution">300</Param> <!-- DPI setting -->
    </Action>
    </Rule>

    <Rule name="OCR_Process">
    <!-- Perform OCR on the images -->
    <Action name="RecognizePage">
    <Param name="OCR_Engine">ABBYY</Param> <!-- Specify OCR engine -->
    <Param name="PageRange">1-</Param> <!-- Process all pages -->
    </Action>
    </Rule>

    <Rule name="Extract_Text">
    <!-- Extract the recognized text -->
    <Action name="ExtractText">
    <Param name="OutputFormat">PlainText</Param> <!-- Output format -->
    <Param name="OutputFilePath">D:\Output\output.txt</Param> <!-- Path to save the text file -->
    </Action>
    </Rule></RuleSet>

    Key Components:

    1. LoadPDFFile Action: This action loads the PDF into Datacap for processing.

    2. ConvertPDFToImage Action: Converts each page of the PDF into an image because OCR engines typically work on image files.

    3. RecognizePage Action: Uses the specified OCR engine (like ABBYY) to recognize text in the images.

    4. ExtractText Action: Extracts the recognized text from the OCR process and saves it to a text file.

    Steps to Implement:

    1. Create a new rule set in Datacap Studio and add these actions to the rule set.

    2. Modify the parameters (like file paths and OCR engine) as per your environment.

    3. Test the rule set by running it against a sample PDF.

    4. Deploy and integrate the rule set into your Datacap workflow.

    Additional Notes:

    • The ConvertPDFToImage and RecognizePage actions may vary depending on the OCR engine and Datacap version you are using.

    • You might need to customize error handling and logging based on your requirements.

    This code snippet provides a basic framework. Depending on your setup, you might need to adjust it or add additional actions to handle specific cases like multi-page PDFs or different OCR engines.


    Note: Anyways IBM has to give the feature/action for this request, because i see many companies requesting this type of requirements.

    Thanks,

    Shyam B.