Class ExtractPDFJob


  • public class ExtractPDFJob
    extends PDFServicesJob
    A job that extracts pdf elements such as text, images, tables in a structured format from a PDF. Sample Usage:
    
                 InputStream inputStream = new FileInputStream(new File("SOURCE_PATH"));
    
                 Credentials credentials = new ServicePrincipalCredentials(
                         System.getenv("PDF_SERVICES_CLIENT_ID"),
                         System.getenv("PDF_SERVICES_CLIENT_SECRET"));
    
                 PDFServices pdfServices = new PDFServices(credentials);
    
                 Asset asset = pdfServices.upload(inputStream, PDFServicesMediaType.PDF.getMediaType());
    
                 ExtractPDFParams extractPDFParams = ExtractPDFParams.extractPDFParamsBuilder()
                         .addElementsToExtract(Arrays.asList(ExtractElementType.TEXT))
                         .build();
    
                 ExtractPDFJob extractPDFJob = new ExtractPDFJob(asset).setParams(extractPDFParams);
    
                 String location = pdfServices.submit(extractPDFJob);
                 PDFServicesResponse<ExtractPDFResult> pdfServicesResponse = pdfServices.getJobResult(location, ExtractPDFResult.class);
    
                 Asset resultAsset = pdfServicesResponse.getResult().getResource();
                 StreamAsset streamAsset = pdfServices.getContent(resultAsset);
     
    • Constructor Detail

      • ExtractPDFJob

        public ExtractPDFJob​(Asset asset)
        Constructs a new ExtractPDFJob instance.
        Parameters:
        asset - Asset object containing the input file; can not be null.
    • Method Detail

      • setParams

        public ExtractPDFJob setParams​(ExtractPDFParams extractPDFParams)
        Sets the parameters for the job.
        Parameters:
        extractPDFParams - ExtractPDFParams object containing the extract PDF parameters; can not be null.
        Returns:
        ExtractPDFJob instance