adobe.pdfservices.operation.pdfops package

Subpackages

adobe.pdfservices.operation.pdfops.options package
- Subpackages
  - adobe.pdfservices.operation.pdfops.options.extractpdf package
  - adobe.pdfservices.operation.pdfops.options.autotagpdf package
- Module contents

Submodules

adobe.pdfservices.operation.pdfops.extract_pdf_operation module

class adobe.pdfservices.operation.pdfops.extract_pdf_operation.ExtractPDFOperation(create_key)

Bases: Operation

An Operation that extracts pdf elements such as text and tables in a structured format from a PDF, along with renditions for tables and figures.

Sample usage.

try:
    base_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

    credentials = Credentials.service_principal_credentials_builder(). \
        with_client_id(os.getenv('PDF_SERVICES_CLIENT_ID')). \
        with_client_secret(os.getenv('PDF_SERVICES_CLIENT_SECRET')). \
        build()

    execution_context = ExecutionContext.create(credentials)
    extract_pdf_operation = ExtractPDFOperation.create_new()

    source = FileRef.create_from_local_file(base_path + "/resources/extractPdfInput.pdf")
    extract_pdf_operation.set_input(source)

    extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
        .with_elements_to_extract([ExtractElementType.TEXT, ExtractElementType.TABLES]) \
        .with_elements_to_extract_renditions([ExtractRenditionsElementType.TABLES, ExtractRenditionsElementType.FIGURES]) \
        .with_get_char_info(True) \
        .with_include_styling_info(True) \
        .build()
    extract_pdf_operation.set_options(extract_pdf_options)

    result: FileRef = extract_pdf_operation.execute(execution_context)

    result.save_as(base_path + "/output/ExtractTextTableWithFigureTableRendition.zip")
except (ServiceApiException, ServiceUsageException, SdkException):
    logging.exception("Exception encountered while executing operation")

SUPPORTED_SOURCE_MEDIA_TYPES = {adobe.pdfservices.operation.internal.extension_media_type_mapping.ExtensionMediaTypeMapping.PDF.mime_type}: Supported source file formats for ExtractPdfOperation is .pdf.

classmethod create_new()

creates a new instance of ExtractPDFOperation.

Returns:: A new instance of ExtractPDFOperation
Return type:: ExtractPDFOperation

execute(execution_context: ExecutionContext)

Executes this operation synchronously using the supplied context and returns a new FileRef instance for the resulting Zip file. The resulting file may be stored in the system temporary directory. See adobe.pdfservices.operation.io.file_ref.FileRef for how temporary resources are cleaned up.

Parameters:: execution_context (ExecutionContext) – The context in which the operation will be executed.
Returns:: The FileRef to the result.
Return type:: FileRef
Raises:: ServiceApiException – if an API call results in an error response.

get_options()

gets the ExtractPDFOptions.

Returns:: The options parameter of the operation
Return type:: ExtractPDFOptions

set_input(source_file_ref: FileRef)

Sets an input file.

Parameters:: source_file_ref (FileRef) – An input file.
Returns:: This instance to add any additional parameters.
Return type:: ExtractPDFOperation

set_options(extract_pdf_options: ExtractPDFOptions)

sets the ExtractPDFOptions.

Parameters:: extract_pdf_options (ExtractPDFOptions) – ExtractPDFOptions to set.
Returns:: This instance to add any additional parameters.
Return type:: ExtractPDFOperation

adobe.pdfservices.operation.pdfops.autotag_pdf_operation module

class adobe.pdfservices.operation.pdfops.autotag_pdf_operation.AutotagPDFOperation(create_key)

Bases: Operation

An operation that enables clients to improve accessibility of the PDF document. It generates the tagged PDF, along with an optional XLSX report providing detailed information about the added tags. The operation replaces any existing tags within the input document, so it provides the most benefit for PDFs that have no tags or low-quality tags.

Sample usage.

try:
    base_path = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

    credentials = Credentials.service_principal_credentials_builder(). \
        with_client_id(os.getenv('PDF_SERVICES_CLIENT_ID')). \
        with_client_secret(os.getenv('PDF_SERVICES_CLIENT_SECRET')). \
        build()

    execution_context = ExecutionContext.create(credentials)
    autotag_pdf_operation = AutotagPDFOperation.create_new()

    input_file_path = 'autotagPdfInput.pdf'
    source = FileRef.create_from_local_file(base_path + "/resources/" + input_file_path)
    autotag_pdf_operation.set_input(source)

    autotag_pdf_options: AutotagPDFOptions = AutotagPDFOptions.builder() \
        .with_shift_headings() \
        .with_generate_report() \
        .build()
    autotag_pdf_operation.set_options(autotag_pdf_options)

    autotag_pdf_output: AutotagPDFOutput = autotag_pdf_operation.execute(execution_context)

    input_file_name = Path(input_file_path).stem
    base_output_path = base_path + "/output/AutotagPDFWithOptions/"

    Path(base_output_path).mkdir(parents=True, exist_ok=True)
    tagged_pdf_path = f'{base_output_path}{input_file_name}-tagged.pdf'
    report_path = f'{base_output_path}{input_file_name}-report.xlsx'

    autotag_pdf_output.get_tagged_pdf().save_as(tagged_pdf_path)
    autotag_pdf_output.get_report().save_as(report_path)

except (ServiceApiException, ServiceUsageException, SdkException) as e:
    logging.exception(f'Exception encountered while executing operation: {e}')

SUPPORTED_SOURCE_MEDIA_TYPES = {adobe.pdfservices.operation.internal.extension_media_type_mapping.ExtensionMediaTypeMapping.PDF.mime_type}: Supported source file formats for AutotagPdfOperation is .pdf.

classmethod create_new()

creates a new instance of AutotagPDFOperation.

Returns:: A new instance of AutotagPDFOperation
Return type:: AutotagPDFOperation

execute(execution_context: ExecutionContext)

Executes this operation synchronously using the supplied context and returns a new AutotagPDFOutput instance for the generated tagged pdf file and XLSX report file. The resulting file may be stored in the system temporary directory. See adobe.pdfservices.operation.io.file_ref.FileRef for how temporary resources are cleaned up.

Parameters:: execution_context (ExecutionContext) – The context in which the operation will be executed.
Returns:: The instance of AutotagPDFOutput.
Return type:: AutotagPDFOutput
Raises:: ServiceApiException – if an API call results in an error response.

get_options()

gets the AutotagPDFOptions.

Returns:: The options parameter of the operation
Return type:: AutotagPDFOptions