API Reference

ClientConfig

class adobe.pdfservices.operation.client_config.ClientConfig

Bases: object

Encapsulates the API request configurations

class Builder

Bases: object

Builds a ClientConfig instance.

build()

Returns a new ClientConfig instance built from the current state of this builder.

Returns

A ClientConfig instance.

Return type

ClientConfig

from_file(client_config_file_path: str)

Sets the connect timeout and read timeout using the JSON client config file path. All the keys in the JSON structure are optional.

Parameters

client_config_file_path (str) – JSON client config file path

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

JSON structure:

{
    "connectTimeout": "4000",
    "readTimeout": "20000"
}
with_connect_timeout(connect_timeout: int)

Sets the connect timeout. It should be greater than zero.

Parameters

connect_timeout (int) – determines the timeout in milliseconds until a connection is established in the API calls. Default value is 4000 milliseconds

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

with_read_timeout(read_timeout: int)

Sets the read timeout. It should be greater than zero.

Parameters

read_timeout (int) – Defines the read timeout in milliseconds, The number of milliseconds the client will wait for the server to send a response after the connection is established. Default value is 10000 milliseconds

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

static builder()

Creates a new ClientConfig builder.

Returns

A ClientConfig.Builder instance.

Return type

ClientConfig.Builder

ClientConfigBuilder

class adobe.pdfservices.operation.client_config.ClientConfig.Builder

Bases: object

Builds a ClientConfig instance.

build()

Returns a new ClientConfig instance built from the current state of this builder.

Returns

A ClientConfig instance.

Return type

ClientConfig

from_file(client_config_file_path: str)

Sets the connect timeout and read timeout using the JSON client config file path. All the keys in the JSON structure are optional.

Parameters

client_config_file_path (str) – JSON client config file path

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

JSON structure:

{
    "connectTimeout": "4000",
    "readTimeout": "20000"
}
with_connect_timeout(connect_timeout: int)

Sets the connect timeout. It should be greater than zero.

Parameters

connect_timeout (int) – determines the timeout in milliseconds until a connection is established in the API calls. Default value is 4000 milliseconds

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

with_read_timeout(read_timeout: int)

Sets the read timeout. It should be greater than zero.

Parameters

read_timeout (int) – Defines the read timeout in milliseconds, The number of milliseconds the client will wait for the server to send a response after the connection is established. Default value is 10000 milliseconds

Returns

This Builder instance to add any additional parameters.

Return type

ClientConfig.Builder

Credentials

class adobe.pdfservices.operation.auth.credentials.Credentials

Bases: abc.ABC

Marker base class for different types of credentials. Currently it supports only ServiceAccountCredentials. The factory methods within this class can be used to create instances of credentials classes.

static service_account_credentials_builder()

Creates a new ServiceAccountCredentials builder.

Returns

An instance of ServiceAccountCredentials Builder.

Return type

ServiceAccountCredentials.Builder

ExecutionContext

class adobe.pdfservices.operation.execution_context.ExecutionContext

Bases: object

Represents the execution context of an Operation. An execution context typically consists of the desired authentication credentials and client configurations such as timeouts.

For each set of credentials, a ExecutionContext instance can be reused across operations.

Sample Usage:

try:
    base_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

    credentials = Credentials.service_account_credentials_builder() \
        .from_file(base_path + "/pdftools-api-credentials.json") \
        .build()

    execution_context = ExecutionContext.create(credentials)
    extract_pdf_operation = ExtractPDFOperation.create_new()

    source = FileRef.create_from_local_file(base_path + "/resources/extractPdfInput.pdf")
    extract_pdf_operation.set_input(source)

    extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
        .with_elements_to_extract([PDFElementType.TEXT, PDFElementType.TABLES]) \
        .with_elements_to_extract_renditions([PDFElementType.TABLES, PDFElementType.FIGURES]) \
        .with_get_char_info(True) \
        .build()
    extract_pdf_operation.set_options(extract_pdf_options)

    result: FileRef = extract_pdf_operation.execute(execution_context)

    result.save_as(base_path + "/output/ExtractTextTableWithFigureTableRendition.zip")
except (ServiceApiException, ServiceUsageException, SdkException):
    logging.exception("Exception encountered while executing operation")
static create(credentials: adobe.pdfservices.operation.auth.credentials.Credentials, client_config: Optional[adobe.pdfservices.operation.client_config.ClientConfig] = None)

Creates a context instance using the provided Credentials and ClientConfig

Parameters
  • credentials (Credentials) – A Credentials instance

  • client_config (ClientConfig, optional) – A ClientConfig instance for providing custom http timeouts, defaults to None

Returns

A new ExecutionContext instance

Return type

ExecutionContext

ExtractPDFOperation

class adobe.pdfservices.operation.pdfops.extract_pdf_operation.ExtractPDFOperation(create_key)

Bases: adobe.pdfservices.operation.operation.Operation

An Operation that extracts pdf elements such as text, images, tables in a structured format from a PDF.

Sample usage.

try:
    base_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

    credentials = Credentials.service_account_credentials_builder() \
        .from_file(base_path + "/pdftools-api-credentials.json") \
        .build()

    execution_context = ExecutionContext.create(credentials)
    extract_pdf_operation = ExtractPDFOperation.create_new()

    source = FileRef.create_from_local_file(base_path + "/resources/extractPdfInput.pdf")
    extract_pdf_operation.set_input(source)

    extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
        .with_elements_to_extract([PDFElementType.TEXT, PDFElementType.TABLES]) \
        .with_elements_to_extract_renditions([PDFElementType.TABLES, PDFElementType.FIGURES]) \
        .with_get_char_info(True) \
        .build()
    extract_pdf_operation.set_options(extract_pdf_options)

    result: FileRef = extract_pdf_operation.execute(execution_context)

    result.save_as(base_path + "/output/ExtractTextTableWithFigureTableRendition.zip")
except (ServiceApiException, ServiceUsageException, SdkException):
    logging.exception("Exception encountered while executing operation")
SUPPORTED_SOURCE_MEDIA_TYPES = {'application/pdf'}

Supported source file formats for ExtractPdfOperation is .pdf.

classmethod create_new()

creates a new instance of ExtractPDFOperation.

Returns

A new instance of ExtractPDFOperation

Return type

ExtractPDFOperation

execute(execution_context: adobe.pdfservices.operation.execution_context.ExecutionContext)

Executes this operation synchronously using the supplied context and returns a new FileRef instance for the resulting Zip file. The resulting file may be stored in the system temporary directory. See adobe.pdfservices.operation.io.file_ref.FileRef for how temporary resources are cleaned up.

Parameters

execution_context (ExecutionContext) – The context in which the operation will be executed.

Returns

The FileRef to the result.

Return type

FileRef

Raises

ServiceApiException – if an API call results in an error response.

set_input(source_file_ref: adobe.pdfservices.operation.io.file_ref.FileRef)

Sets an input file.

Parameters

source_file_ref (FileRef) – An input file.

Returns

This instance to add any additional parameters.

Return type

ExtractPDFOperation

set_options(extract_pdf_options: adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions)

sets the ExtractPDFOptions.

Parameters

extract_pdf_options (ExtractPDFOptions) – ExtractPDFOptions to set.

Returns

This instance to add any additional parameters.

Return type

ExtractPDFOperation

ExtractPDFOptions

class adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions(elements_to_extract, elements_to_extract_renditions, get_char_info, table_output_format)

Bases: object

An Options Class that defines the options for ExtractPDFOperation.

extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
    .with_elements_to_extract([PDFElementType.TEXT, PDFElementType.TABLES]) \
    .with_get_char_info(True) \
    .with_table_structure_format(TableStructureType.CSV) \
    .with_elements_to_extract_renditions([PDFElementType.FIGURES, PDFElementType.TABLES]) \
    .build()
class Builder

Bases: object

The builder for ExtractPDFOptions.

build()
with_element_to_extract(element_to_extract: adobe.pdfservices.operation.pdfops.options.extractpdf.pdf_element_type.PDFElementType)

adds a pdf element type for extracting structured information.

Parameters

element_to_extract (PDFElementType) – PDFElementType to be extracted

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

Raises

ValueError – if element_to_extract is None.

with_element_to_extract_renditions(element_to_extract_renditions: adobe.pdfservices.operation.pdfops.options.extractpdf.pdf_element_type.PDFElementType)

adds a pdf element type for extracting rendition.

Parameters

element_to_extract_renditions (PDFElementType) – PDFElementType whose renditions have to be extracted

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

Raises

ValueError – if element_to_extract_renditions is None.

with_elements_to_extract(elements_to_extract: List[adobe.pdfservices.operation.pdfops.options.extractpdf.pdf_element_type.PDFElementType])

adds a list of pdf element types for extracting structured information.

Parameters

elements_to_extract (List[PDFElementType]) – List of PDFElementType to be extracted

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

Raises

ValueError – if elements_to_extract is None or empty list.

with_elements_to_extract_renditions(elements_to_extract_renditions: List[adobe.pdfservices.operation.pdfops.options.extractpdf.pdf_element_type.PDFElementType])

adds a list of pdf element types for extracting rendition.

Parameters

elements_to_extract_renditions (List[PDFElementType]) – List of PDFElementType whose renditions have to be extracted

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

Raises

ValueError – if elements_to_extract is None or empty list.

with_get_char_info(get_char_info: bool)

sets the Boolean specifying whether to add character level bounding boxes to output json

Parameters

get_char_info (bool) – Set True to extract character level bounding boxes information

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

with_table_structure_format(table_structure: adobe.pdfservices.operation.pdfops.options.extractpdf.table_structure_type.TableStructureType)

adds the table structure format (currently csv only) for extracting structured information.

Parameters

table_structure (TableStructureType) – TableStructureType to be extracted

Returns

This Builder instance to add any additional parameters.

Return type

ExtractPDFOptions.Builder

Raises

ValueError – if table_structure is None.

static builder()

Returns a Builder for ExtractPDFOptions

Returns

The builder class for ExtractPDFOptions

Return type

ExtractPDFOptions.Builder

property elements_to_extract

List of pdf element types to be extracted in a structured format from input file

property elements_to_extract_renditions

List of pdf element types whose renditions needs to be extracted from input file

property get_char_info

Boolean specifying whether to add character level bounding boxes to output json

property table_output_format

export table in specified format - currently csv supported

FileRef

class adobe.pdfservices.operation.io.file_ref.FileRef

Bases: abc.ABC

This class represents a local file. It is typically used by an SDK Operation which accepts or returns files.

When a FileRef instance is created by this SDK while referring to a temporary file location, calling any of the methods to save the fileRef (For example, create_from_stream() etc.) will delete the temporary file.

static create_from_local_file(local_source: str, media_type: Optional[str] = None)

Creates a FileRef instance from a local file path. If no media type is provided, it will be inferred from the file extension.

Parameters
  • local_source (str) – Local file path, either absolute path or relative to the working directory.

  • media_type (str, optional, defaults to None) – Media type to identify the local file format, defaults to None

Returns

A FileRef instance.

Return type

FileRef

static create_from_stream(input_stream: _io.BufferedReader, media_type: str)

Creates a FileRef instance from a readable stream using the specified media type. The stream is not read by this method but by consumers of file content i.e. the execute method of an operation such as execute().

Parameters
  • input_stream (BufferedReader) – Readable Stream representing the file.

  • media_type (str) – Media type to identify the file format.

Returns

A FileRef instance.

Return type

FileRef

abstract save_as(local_file_path: str)
abstract write_to_stream(writer_stream)

Exceptions

exception adobe.pdfservices.operation.exception.exceptions.SdkException(message, request_tracking_id=None)

Bases: Exception

SdkException is typically thrown for client-side or network errors.

property request_tracking_id

The request tracking id of the exception.

exception adobe.pdfservices.operation.exception.exceptions.ServiceApiException(message, request_tracking_id, status_code=0, error_code='UNKNOWN')

Bases: Exception

ServiceApiException is thrown when an underlying service API call results in an error.

DEFAULT_ERROR_CODE = 'UNKNOWN'

Returns the HTTP Status code or DEFAULT_STATUS_CODE if the status code doesn’t adequately represent the error.

DEFAULT_STATUS_CODE = 0

The default value of status code if there is no status code for this service exception.

property error_code

Returns the detailed message of this error.

property request_tracking_id

The request tracking id of the exception.

property status_code

Returns the HTTP Status code or DEFAULT_STATUS_CODE if the status code doesn’t adequately represent the error.

exception adobe.pdfservices.operation.exception.exceptions.ServiceUsageException(message, request_tracking_id, status_code=429, error_code='UNKNOWN')

Bases: Exception

ServiceUsageError is thrown when either service usage limit has been reached or credentials quota has been exhausted.

DEFAULT_ERROR_CODE = 'UNKNOWN'

The default value of error code if there is no status code for this service failure.

DEFAULT_STATUS_CODE = 429

The default value of status code if there is no status code for this service failure.

property error_code

Returns the detailed message of this error.

property request_tracking_id

The request tracking id of the exception.

property status_code

Returns the HTTP Status code or DEFAULT_STATUS_CODE if the status code doesn’t adequately represent the error.

ServiceAccountCredentials

class adobe.pdfservices.operation.auth.service_account_credentials.ServiceAccountCredentials(client_id, client_secret, private_key, organization_id, account_id, ims_base_uri='https://ims-na1.adobelogin.com', claim=None)

Bases: adobe.pdfservices.operation.auth.credentials.Credentials, abc.ABC

Service Account credentials allow your application to call PDF Tools Extract API on behalf of the application itself, or on behalf of an enterprise organization. For getting the credentials, Click Here.

class Builder

Bases: object

Builds a ServiceAccountCredentials instance.

build()

Returns a new ServiceAccountCredentials instance built from the current state of this builder.

Returns

A ServiceAccountCredentials instance.

Return type

ServiceAccountCredentials

from_file(credentials_file_path: str)

Sets Service Account Credentials using the JSON credentials file path. All the keys in the JSON structure are optional.

JSON structure:

{
    "client_credentials": {
    "client_id": "CLIENT_ID",
    "client_secret": "CLIENT_SECRET"
  },
  "service_account_credentials": {
    "organization_id": "org_ident@AdobeOrg",
    "account_id": "id@techacct.adobe.com",
    "private_key_file": "private.key"
  }
}

private_key_file is the path of private key file. It will be looked up in the classpath and the directory of JSON credentials file.

Parameters

credentials_file_path (str) – JSON credentials file path

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_account_id(account_id: str)

Set Account Id (format: id@techacct.adobe.com)

Parameters

account_id (str) – Account ID (format: id@techacct.adobe.com)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_client_id(client_id: str)

Set Client ID (API Key)

Parameters

client_id (str) – Client Id (API Key)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_client_secret(client_secret: str)

Set Client Secret

Parameters

client_secret (str) – Client Secret

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_organization_id(organization_id: str)

Set Organization Id (format: org_ident@AdobeOrg) that has been configured for access to PDF Tools API

Parameters

organization_id (str) – Organization ID (format: org_ident@AdobeOrg)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_private_key(private_key: str)

Set private key

Parameters

private_key (str) – Content of the Private Key (PEM format)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

property account_id

Account ID(format: id@techacct.adobe.com)

property claim

Identifies the Service for which Authorization(Access) Token will be issued

property client_id

Client Id (API Key)

property client_secret

Client Secret

property organization_id

Identifies the organization(format: org_ident@AdobeOrg) that has been configured for access to PDF Tools API.

property private_key

Content of the Private Key (PEM format)

ServiceAccountCredentialsBuilder

class adobe.pdfservices.operation.auth.service_account_credentials.ServiceAccountCredentials.Builder

Bases: object

Builds a ServiceAccountCredentials instance.

build()

Returns a new ServiceAccountCredentials instance built from the current state of this builder.

Returns

A ServiceAccountCredentials instance.

Return type

ServiceAccountCredentials

from_file(credentials_file_path: str)

Sets Service Account Credentials using the JSON credentials file path. All the keys in the JSON structure are optional.

JSON structure:

{
    "client_credentials": {
    "client_id": "CLIENT_ID",
    "client_secret": "CLIENT_SECRET"
  },
  "service_account_credentials": {
    "organization_id": "org_ident@AdobeOrg",
    "account_id": "id@techacct.adobe.com",
    "private_key_file": "private.key"
  }
}

private_key_file is the path of private key file. It will be looked up in the classpath and the directory of JSON credentials file.

Parameters

credentials_file_path (str) – JSON credentials file path

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_account_id(account_id: str)

Set Account Id (format: id@techacct.adobe.com)

Parameters

account_id (str) – Account ID (format: id@techacct.adobe.com)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_client_id(client_id: str)

Set Client ID (API Key)

Parameters

client_id (str) – Client Id (API Key)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_client_secret(client_secret: str)

Set Client Secret

Parameters

client_secret (str) – Client Secret

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_organization_id(organization_id: str)

Set Organization Id (format: org_ident@AdobeOrg) that has been configured for access to PDF Tools API

Parameters

organization_id (str) – Organization ID (format: org_ident@AdobeOrg)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder

with_private_key(private_key: str)

Set private key

Parameters

private_key (str) – Content of the Private Key (PEM format)

Returns

This Builder instance to add any additional parameters.

Return type

ServiceAccountCredentials.Builder