A job that extracts PDF elements such as text, images, tables in a structured format from a PDF.

Example

Sample Usage:

        const readStream = fs.createReadStream("<SOURCE_PATH>");

const credentials = new ServicePrincipalCredentials({
clientId: process.env.PDF_SERVICES_CLIENT_ID,
clientSecret: process.env.PDF_SERVICES_CLIENT_SECRET
});

const pdfServices = new PDFServices({credentials});

const inputAsset = await pdfServices.upload({
readStream,
mimeType: MimeType.PDF
});

const params = new ExtractPDFParams({
elementsToExtract: [ExtractElementType.TEXT]
});

const job = new ExtractPDFJob({inputAsset, params});

const pollingURL = await pdfServices.submit({job});

const pdfServicesResponse = await pdfServices.getJobResult({
pollingURL,
resultType: ExtractPDFResult
});

const resultAsset = pdfServicesResponse.result.resource;
const streamAsset = await pdfServices.getContent({asset: resultAsset});

Hierarchy (view full)

Constructors

Properties

_extractPDFParams?: ExtractPDFParams
_inputAsset: Asset
_outputAsset?: Asset

Methods

  • Parameters

    Returns PDFServicesApiRequest

  • Parameters

    • executionContext: ExecutionContext

    Returns void

Generated using TypeDoc