ExtractPdfOperation

ExtractPdfOperation

An Operation that extracts pdf elements such as text, images, tables in a structured format from a PDF.

Sample Usage:


 try {

	const credentials =  PDFToolsSdk.Credentials
		.serviceAccountCredentialsBuilder()
		.fromFile("pdftools-api-credentials.json")
		.build();

	const clientContext = PDFToolsSdk.ExecutionContext
			.create(credentials),
		extractPDFOperation = PDFToolsSdk.ExtractPDF.Operation
			.createNew(),

		input = PDFToolsSdk.FileRef.createFromLocalFile(
			'test/resources/extractPdfInput.pdf',
			PDFToolsSdk.ExtractPDF.SupportedSourceFormat.pdf
		);

	extractPDFOperation.setInput(input);

	extractPDFOperation.addElementToExtract(PDFToolsSdk.PDFElementType.TEXT);
	extractPDFOperation.addElementToExtract(PDFToolsSdk.PDFElementType.TABLES);

	extractPDFOperation.addElementToExtractRenditions(PDFToolsSdk.PDFElementType.FIGURES);
	extractPDFOperation.addElementToExtractRenditions(PDFToolsSdk.PDFElementType.TABLES);

	extractPDFOperation.execute(clientContext)
		.then(result => result.saveAsFile('output/extractPdf.zip'))
		.catch(err => console.log(err));

	}

 catch (err) {

	 throw err;

	}

Members

(static, constant) SupportedSourceFormat

Properties:
Name Type Description
pdf string

Represents "application/pdf" media type

Supported source file formats for ExtractPdfOperation is .pdf.

elementsToExtract

List of pdf element types to be extracted in a structured format from input file

elementsToExtractRenditions

List of pdf element types whose renditions needs to be extracted from input file

tableOutFormat

export table in specified format - currently csv supported

getCharInfo

Boolean specifying whether to add character level bounding boxes to output json

Methods

(static) createNew() → {ExtractPdfOperation}

Constructs a ExtractPdfOperation instance.

Returns:

A new ExtractPdfOperation instance.

Type
ExtractPdfOperation

setInput(sourceFileRefnon-null)

Sets an input file.

Parameters:
Name Type Description
sourceFileRef FileRef

An input file.

addElementToExtract(element)

Adds a pdf element type for extracting structured information.

Parameters:
Name Type Description
element PDFElementType

PDFElementType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementToExtractRenditions(element)

Adds a pdf element type for extracting rendition.

Parameters:
Name Type Description
element PDFElementType

PDFElementType whose renditions have to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addTableStructureFormat(element)

Adds the table structure format (currently csv only) for extracting structured information.

Parameters:
Name Type Description
element TableStructureType

TableStructureType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addCharInfo(element)

Boolean specifying whether to add character level bounding boxes to output json

Parameters:
Name Type Description
element Boolean

Set True to extract character level bounding boxes information

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementsToExtract(…elements)

Adds a pdf element type for extracting structured information.

Parameters:
Name Type Attributes Description
elements PDFElementType <repeatable>

List of PDFElementType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementsToExtractRenditions(…elements)

Add pdf element types for extracting renditions.

Parameters:
Name Type Attributes Description
elements PDFElementType <repeatable>

List of PDFElementType whose renditions have to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

execute(contextnon-null) → {Promise.<T>}

Executes this operation using the supplied context and returns a Promise which resolves to the operation result.

The resulting file may be stored in the system temporary directory (per the os.tempdir(), symlinks are resolved to the actual path). See FileRef for how temporary resources are cleaned up.

Parameters:
Name Type Description
context ExecutionContext

The context in which the operation will be executed.

Throws:
  • if an API call results in an error response.

    Type
    ServiceApiError
  • if service usage limits have been reached or credentials quota has been exhausted.

    Type
    ServiceUsageError
Returns:

A promise which resolves to the operation result.

Type
Promise.<T>