adobe.pdfservices.operation.pdfjobs.params.extract_pdf package

Submodules

adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_element_type module

class adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_element_type.ExtractElementType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Supported inputs for Elements to Extract ExtractPDFJob.

TABLES = 'tables'

Tabular Data

TEXT = 'text'

Textual Data

adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_pdf_params module

class adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_pdf_params.ExtractPDFParams(*, table_structure_type: TableStructureType = TableStructureType.XLSX, add_char_info: bool = False, styling_info: bool = False, elements_to_extract: List | None = None, elements_to_extract_renditions: List | None = None)

Bases: PDFServicesJobParams

Parameters to extract content from PDF using ExtractPDFJob.

Construct a new ExtractPDFParams

Parameters:
  • table_structure_type (TableStructureType) – TableStructureType for output type of table structure. (Optional, use key-value)

  • add_char_info (bool) – Boolean specifying whether to add character level bounding boxes to output json. (Optional, use key-value)

  • styling_info (bool) – Boolean specifying whether to add styling information to output json. (Optional, use key-value)

  • elements_to_extract (List) – List of ExtractElementType to be extracted. (Optional, use key-value)

  • elements_to_extract_renditions (List) – List of ExtractElementType. (Optional, use key-value)

get_add_char_info()
Returns:

Whether character level information was invoked for operation.

Return type:

bool

get_elements_to_extract()
Returns:

The list of elements (Text and/or Tables) invoked for operation

Return type:

list

get_elements_to_extract_renditions()
Returns:

Returns the list of ExtractElementType invoked for job.

Return type:

list

get_styling_info()
Returns:

Whether styling information was invoked for operation.

Return type:

bool

get_table_structure_type()
Returns:

Returns the TableStructureType of the resulting rendition

Return type:

TableStructureType

adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_renditions_element_type module

class adobe.pdfservices.operation.pdfjobs.params.extract_pdf.extract_renditions_element_type.ExtractRenditionsElementType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Supported inputs for Renditions To Extract ExtractPDFJob.

FIGURES = 'figures'

Image Data

TABLES = 'tables'

Tabular Data

adobe.pdfservices.operation.pdfjobs.params.extract_pdf.table_structure_type module

class adobe.pdfservices.operation.pdfjobs.params.extract_pdf.table_structure_type.TableStructureType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Supported Formats for exporting Table Data ExtractPDFJob.

CSV = 'csv'

CSV Format

XLSX = 'xlsx'

XLSX Format

Module contents