adobe.pdfservices.operation.pdfops.options.extractpdf package

Submodules

adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options module

class adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions(elements_to_extract, elements_to_extract_renditions, get_char_info, table_output_format, include_styling_info=None)

Bases: object

An Options Class that defines the options for ExtractPDFOperation.

extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \
    .with_elements_to_extract([ExtractElementType.TEXT, ExtractElementType.TABLES]) \
    .with_get_char_info(True) \
    .with_table_structure_format(TableStructureType.CSV) \
    .with_elements_to_extract_renditions([ExtractRenditionsElementType.FIGURES, ExtractRenditionsElementType.TABLES]) \
    .with_include_styling_info(True) \
    .build()
class Builder

Bases: object

The builder for ExtractPDFOptions.

build()
with_element_to_extract(element_to_extract: ExtractElementType)

adds a pdf element type for extracting structured information.

Parameters:

element_to_extract (ExtractElementType) – ExtractElementType to be extracted

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

Raises:

ValueError – if element_to_extract is None.

with_element_to_extract_renditions(element_to_extract_renditions: ExtractRenditionsElementType)

adds a pdf element type for extracting rendition.

Parameters:

element_to_extract_renditions (ExtractRenditionsElementType) – ExtractRenditionsElementType whose renditions have to be extracted

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

Raises:

ValueError – if element_to_extract_renditions is None.

with_elements_to_extract(elements_to_extract: List[ExtractElementType])

adds a list of pdf element types for extracting structured information.

Parameters:

elements_to_extract (List[ExtractElementType]) – List of ExtractElementType to be extracted

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

Raises:

ValueError – if elements_to_extract is None or empty list.

with_elements_to_extract_renditions(elements_to_extract_renditions: List[ExtractRenditionsElementType])

adds a list of pdf element types for extracting rendition.

Parameters:

elements_to_extract_renditions (List[ExtractRenditionsElementType]) – List of ExtractRenditionsElementType whose renditions have to be extracted

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

Raises:

ValueError – if elements_to_extract is None or empty list.

with_get_char_info(get_char_info: bool)

sets the Boolean specifying whether to add character level bounding boxes to output json

Parameters:

get_char_info (bool) – Set True to extract character level bounding boxes information

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

with_include_styling_info(include_styling_info: bool)

sets the Boolean specifying whether to add PDF Elements Styling Info to output json

Parameters:

include_styling_info (bool) – Set True to extract PDF Elements Styling Info

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

with_table_structure_format(table_structure: TableStructureType)

adds the table structure format (currently csv only) for extracting structured information.

Parameters:

table_structure (TableStructureType) – TableStructureType to be extracted

Returns:

This Builder instance to add any additional parameters.

Return type:

ExtractPDFOptions.Builder

Raises:

ValueError – if table_structure is None.

static builder()

Returns a Builder for ExtractPDFOptions

Returns:

The builder class for ExtractPDFOptions

Return type:

ExtractPDFOptions.Builder

property elements_to_extract

List of pdf element types to be extracted in a structured format from input file

property elements_to_extract_renditions

List of pdf element types whose renditions needs to be extracted from input file

property get_char_info

Boolean specifying whether to add character level bounding boxes to output json

property include_styling_info

Boolean specifying whether to add PDF Elements Styling Info to output json

property table_output_format

export table in specified format - currently csv supported

adobe.pdfservices.operation.pdfops.options.extractpdf.extract_element_type module

class adobe.pdfservices.operation.pdfops.options.extractpdf.extract_element_type.ExtractElementType(value)

Bases: str, Enum

enum of ElementTypes in a PDF which can be extracted as json.

TABLES = 'tables'
TEXT = 'text'

adobe.pdfservices.operation.pdfops.options.extractpdf.extract_renditions_element_type module

class adobe.pdfservices.operation.pdfops.options.extractpdf.extract_renditions_element_type.ExtractRenditionsElementType(value)

Bases: str, Enum

enum of ElementTypes in a PDF which can be extracted as renditions.

FIGURES = 'figures'
TABLES = 'tables'

adobe.pdfservices.operation.pdfops.options.extractpdf.table_structure_type module

class adobe.pdfservices.operation.pdfops.options.extractpdf.table_structure_type.TableStructureType(value)

Bases: str, Enum

enum of TableStructureType in a PDF.

CSV = 'csv'

Module contents