ExtractPdfOperation

ExtractPdfOperation

An Operation that extracts pdf elements such as text, images, tables in a structured format from a PDF.

Sample Usage:


 try {

	const credentials =  PDFToolsSdk.Credentials
		.serviceAccountCredentialsBuilder()
		.fromFile("pdftools-api-credentials.json")
		.build();

	const clientContext = PDFToolsSdk.ExecutionContext
			.create(credentials),
		extractPDFOperation = PDFToolsSdk.ExtractPDF.Operation
			.createNew(),

		input = PDFToolsSdk.FileRef.createFromLocalFile(
			'test/resources/extractPdfInput.pdf',
			PDFToolsSdk.ExtractPDF.SupportedSourceFormat.pdf
		);

	extractPDFOperation.setInput(input);

	extractPDFOperation.addElementToExtract(PDFToolsSdk.PDFElementType.TEXT);
	extractPDFOperation.addElementToExtract(PDFToolsSdk.PDFElementType.TABLES);

	extractPDFOperation.addElementToExtractRenditions(PDFToolsSdk.PDFElementType.FIGURES);
	extractPDFOperation.addElementToExtractRenditions(PDFToolsSdk.PDFElementType.TABLES);

	extractPDFOperation.execute(clientContext)
		.then(result => result.saveAsFile('output/extractPdf.zip'))
		.catch(err => console.log(err));

	}

 catch (err) {

	 throw err;

	}

Members

(static, constant) SupportedSourceFormat

Properties:

Name	Type	Description
`pdf`	string	Represents "application/pdf" media type

Supported source file formats for ExtractPdfOperation is .pdf.

elementsToExtract

List of pdf element types to be extracted in a structured format from input file

elementsToExtractRenditions

List of pdf element types whose renditions needs to be extracted from input file

tableOutFormat

export table in specified format - currently csv supported

getCharInfo

Boolean specifying whether to add character level bounding boxes to output json

Methods

(static) createNew() → {ExtractPdfOperation}

Constructs a ExtractPdfOperation instance.

Returns:

A new ExtractPdfOperation instance.

Type: ExtractPdfOperation

setInput(sourceFileRefnon-null)

Sets an input file.

Parameters:

Name	Type	Description
`sourceFileRef`	FileRef	An input file.

addElementToExtract(element)

Adds a pdf element type for extracting structured information.

Parameters:

Name	Type	Description
`element`	PDFElementType	PDFElementType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementToExtractRenditions(element)

Adds a pdf element type for extracting rendition.

Parameters:

Name	Type	Description
`element`	PDFElementType	PDFElementType whose renditions have to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addTableStructureFormat(element)

Adds the table structure format (currently csv only) for extracting structured information.

Parameters:

Name	Type	Description
`element`	TableStructureType	TableStructureType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addCharInfo(element)

Boolean specifying whether to add character level bounding boxes to output json

Parameters:

Name	Type	Description
`element`	Boolean	Set True to extract character level bounding boxes information

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementsToExtract(…elements)

Adds a pdf element type for extracting structured information.

Parameters:

Name	Type	Attributes	Description
`elements`	PDFElementType	<repeatable>	List of PDFElementType to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

addElementsToExtractRenditions(…elements)

Add pdf element types for extracting renditions.

Parameters:

Name	Type	Attributes	Description
`elements`	PDFElementType	<repeatable>	List of PDFElementType whose renditions have to be extracted

Returns:

ExtractPdfOperation - current ExtractPDFOperation instance

execute(contextnon-null) → {Promise.<T>}

Executes this operation using the supplied context and returns a Promise which resolves to the operation result.

The resulting file may be stored in the system temporary directory (per the os.tempdir(), symlinks are resolved to the actual path). See FileRef for how temporary resources are cleaned up.

Parameters:

Name	Type	Description
`context`	ExecutionContext	The context in which the operation will be executed.

Throws:

if an API call results in an error response.

Type

ServiceApiError
if service usage limits have been reached or credentials quota has been exhausted.

Type

ServiceUsageError

Returns:

A promise which resolves to the operation result.

Type: Promise.<T>