adobe.pdfservices.operation.pdfops.options.extractpdf package
Submodules
adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options module
- class adobe.pdfservices.operation.pdfops.options.extractpdf.extract_pdf_options.ExtractPDFOptions(elements_to_extract, elements_to_extract_renditions, get_char_info, table_output_format, include_styling_info=None)
Bases:
object
An Options Class that defines the options for ExtractPDFOperation.
extract_pdf_options: ExtractPDFOptions = ExtractPDFOptions.builder() \ .with_elements_to_extract([ExtractElementType.TEXT, ExtractElementType.TABLES]) \ .with_get_char_info(True) \ .with_table_structure_format(TableStructureType.CSV) \ .with_elements_to_extract_renditions([ExtractRenditionsElementType.FIGURES, ExtractRenditionsElementType.TABLES]) \ .with_include_styling_info(True) \ .build()
- class Builder
Bases:
object
The builder for
ExtractPDFOptions
.- build()
- with_element_to_extract(element_to_extract: ExtractElementType)
adds a pdf element type for extracting structured information.
- Parameters:
element_to_extract (ExtractElementType) – ExtractElementType to be extracted
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- Raises:
ValueError – if element_to_extract is None.
- with_element_to_extract_renditions(element_to_extract_renditions: ExtractRenditionsElementType)
adds a pdf element type for extracting rendition.
- Parameters:
element_to_extract_renditions (ExtractRenditionsElementType) – ExtractRenditionsElementType whose renditions have to be extracted
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- Raises:
ValueError – if element_to_extract_renditions is None.
- with_elements_to_extract(elements_to_extract: List[ExtractElementType])
adds a list of pdf element types for extracting structured information.
- Parameters:
elements_to_extract (List[ExtractElementType]) – List of ExtractElementType to be extracted
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- Raises:
ValueError – if elements_to_extract is None or empty list.
- with_elements_to_extract_renditions(elements_to_extract_renditions: List[ExtractRenditionsElementType])
adds a list of pdf element types for extracting rendition.
- Parameters:
elements_to_extract_renditions (List[ExtractRenditionsElementType]) – List of ExtractRenditionsElementType whose renditions have to be extracted
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- Raises:
ValueError – if elements_to_extract is None or empty list.
- with_get_char_info(get_char_info: bool)
sets the Boolean specifying whether to add character level bounding boxes to output json
- Parameters:
get_char_info (bool) – Set True to extract character level bounding boxes information
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- with_include_styling_info(include_styling_info: bool)
sets the Boolean specifying whether to add PDF Elements Styling Info to output json
- Parameters:
include_styling_info (bool) – Set True to extract PDF Elements Styling Info
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- with_table_structure_format(table_structure: TableStructureType)
adds the table structure format (currently csv only) for extracting structured information.
- Parameters:
table_structure (TableStructureType) – TableStructureType to be extracted
- Returns:
This Builder instance to add any additional parameters.
- Return type:
- Raises:
ValueError – if table_structure is None.
- static builder()
Returns a Builder for
ExtractPDFOptions
- Returns:
The builder class for ExtractPDFOptions
- Return type:
- property elements_to_extract
List of pdf element types to be extracted in a structured format from input file
- property elements_to_extract_renditions
List of pdf element types whose renditions needs to be extracted from input file
- property get_char_info
Boolean specifying whether to add character level bounding boxes to output json
- property include_styling_info
Boolean specifying whether to add PDF Elements Styling Info to output json
- property table_output_format
export table in specified format - currently csv supported