Creating and Modifying PDF Documents

This chapter provides a detailed overview of how to apply JavaScript in order to dynamically create PDF files, modify them, and convert PDF files to XML format.

Creating and modifying PDF files

The Acrobat extensions to JavaScript provide support for dynamic PDF file creation and content generation. This means that it is possible to dynamically create a new PDF file and modify its contents in an automated fashion. This can help make a document responsive to user input and can enhance the workflow.

To create a new PDF file, invoke the newDoc method of the app object, as shown in the example below:

var myDoc = app.newDoc();

This statement creates a blank PDF document and is used primarily for testing purposes.

Once this statement has been executed from the console, you can manipulate the page by invoking methods contained within the Doc object, as indicated in the following table. Details of these methods are found in the Acrobat JavaScript API Reference.

JavaScript for manipulating a PDF document

Content

Object

Methods

page

Doc

newPage, insertPages, replacePage

page

template

spawn

annot

Doc

addAnnot

field

Doc

addField

icon

Doc

addIcon

link

Doc

addLink

document-level| JavaScript

Doc

addScript

thumbnails

Doc

addThumbnails

bookmark

Doc .bookmarkRoot

createChild, insertChild

web link

Doc

addWebLinks

template

Doc

createTemplate

The Doc.newDoc() method cannot write text content to the newly created document. To do that, you need to use the Report object.

Creating a document with content

The following example creates a PDF document, sets the font size, sets the color to blue, and writes a standard string to the document using the writeText method of the Report object. Finally, it opens the document in the viewer. See the JavaScript for Acrobat API Reference or details of this object, its properties and methods and for additional examples.

var rep = new Report();
rep.size = 1.2;
rep.color = color.blue;
rep.writeText("Hello World!");
rep.open("My Report");

The Report object has many useful applications. With it, for example, you can create a document that reports back a list of all form fields in the document, along with their types and values; another application is to summarize all comments in a document. the Acrobat JavaScript API Reference has an example of the latter application in the Report object section.

Combining PDF documents

You can customize and automate the process of combining PDF documents.

If you would like to combine multiple PDF files into a single PDF document, you can do so through a series of calls to the Doc object’s insertPages method.

  1. Creating a new document from two other documents

// Create a new PDF document:
var newDoc = app.newDoc();

// Insert doc1.pdf:
newDoc.insertPages({
    nPage: -1,
    cPath: "/c/temp/doc1.pdf",
});

// Insert doc2.pdf:
newDoc.insertPages({
    nPage: newDoc.numPages-1,
    cPath: "/c/temp/doc2.pdf",
});

// Save the new document:
newDoc.saveAs({
    cPath: "/c/temp/myNewDoc.pdf";
});

// Close the new document without notifying the user:
newDoc.closeDoc(true);

Combining files

It is possible to combine several PDF files using the Doc.insertPages() method.

Combining several PDF files

In this example, a document is opened with an absolute path reference, then other PDF files in the same folder are appended to the end of the document. For convenience, the files that are appended are placed in an array for easy execution and generalization.

var doc = app.openDoc({
    cPath: "/C/temp/doc1.pdf"
})
aFiles = new Array("doc2.pdf","doc3.pdf");
for ( var i=0; i < aFiles.length; i++) {
    doc.insertPages ({
    nPage: doc.numPages-1,
    cPath: aFiles[i],
    nStart: 0
    });
}

Another problem is to combine several files of possibly different file types. In recent versions of Acrobat, the notion of a binder as introduced. There is a nice UI for combining files of different formats. How do you do it programmatically?

Combining files of different formats

In this example, an initial PDF file is opened, and all other files are appended to it.

doc = app.openDoc({ cPath: "/C/temp/doc1.pdf" })
// List of files of different extensions
aFiles = new Array( "doc2.eps", "doc3.jpg", "doc4.pdf");

for ( var i=0; i < aFiles.length; i++) {
   // Open and convert the document
   newDoc = app.openDoc({
       oDoc: doc,
       cPath: aFiles[i],
       bUseConv: true
   })
   // Save the new PDF file to a temp folder
   newDoc.saveAs({ cPath: "/c/temp/tmpDoc.pdf" });
   // Close it without notice
   newDoc.closeDoc(true);
   // Now insert that PDF file just saved to the end of the first document
   doc.insertPages ({
       nPage: doc.numPages-1,
       cPath: "/c/temp/tmpDoc.pdf",
       nStart: 0
   });
}

Extracting files

You can also programmatically extract pages and save them to a folder.

Suppose the current document consists of a sequence of invoices, each of which occupies one page. The following code creates separate PDF files, one for each invoice:

var filename = "invoice";
for (var i = 0; i < this.numPages; i++)
    this.extractPages({
        nStart: i,
        cPath : filename + i + ".pdf"
    });

Creating file attachments

Another way you can “combine files” is by attaching one or more files to your PDF document. This is useful for packaging a collection of documents and send them together by emailing the PDF file. This section describes the basic object, properties and methods of attaching and manipulating attachments.

These are the objects, properties and methods relevant to file attachments.

Name

Description

Doc.createDataObject()

Creates a file attachment.

Doc.dataObjects

Returns an array of Data objects representing all files attached to the document.

Doc.exportDataObject()

Saves the file attachment to the local file system

Doc.getDataObject()

Acquires the Data object of a particular attachment.

Doc.importDataObject()

Attaches a file to the document.

Doc.removeDataObject()

Removes a file attachment from the document.

Doc.openDataObject()

Returns the Doc object for an attached PDF file.

Doc.getDataObjectContents()

Allows access to the contents of the file attachment associated with a Data object.

Doc.setDataObjectContents()

Rights to the file attachment.

util.streamFromString()

Converts a stream from a string

util.stringFromStream()

Converts a string from a stream.

Saving form data to and reading form data from an attachment

This example takes the response given in a text field of this document and appends it to an attached document. (Perhaps this document is circulating by email, and the user can add in their comments through a multiline text field.) This example uses four of the methods listed above.

var v = this.getField("myTextField").value;
// Get the contents of the file attachment with the name "MyNotes.txt"
var oFile = this.getDataObjectContents("MyNotes.txt");
// Convert the returned stream to a string
var cFile = util.stringFromStream(oFile, "utf-8");
// Append new data at the end of the string
cFile += "rn" + v;
// Convert back to a stream
oFile = util.streamFromString( cFile, "utf-8");
// Overwrite the old attachment
this.setDataObjectContents("MyNotes.txt", oFile);

// Read the contents of the file attachment to a multiline text field
var oFile = this.getDataObjectContents("MyNotes.txt");
var cFile = util.stringFromStream(oFile, "utf-8");
this.getField("myTextField").value = cFile;

Beginning with Acrobat 8, the JavaScript interpreter includes E4X, the ECMA-357 Standard that provides native support of XML in JavaScript. See the document ECMAScript for XML (E4X) Specification or the complete specification of E4X. The next example illustrates the use of E4X and file attachments.

Accessing an XML attachment using E4X

The following script describes a simple database system. The database is an XML document attached to the PDF file. The user enters the employee ID into a text field, the JavaScript accesses the attachment, finds the employee’s record and displays the contents of the retrieved record in form fields.

We have a PDF file, employee.pdf, with three form fields, whose names are employee.id, employee.name.first and employee.name.last. Attached to the PDF file is an XML document created by the following script:

// Some E4X code to create a database of info
x = <employees/>;
function popXML(x,id,fname,lname)
{
    y = <a/>;
    y.employee.@id = id;
    y.employee.name.first = fname;
    y.employee.name.last = lname;
    x.employee += y.employee;
}
popXML(x,"334234", "John", "Public");
popXML(x,"324234", "Jane", "Doe");
popXML(x,"452342", "Davey", "Jones");
popXML(x,"634583", "Tom", "Jefferson");

Copy and paste this code into the console and execute it. You’ll see the XML document as the output of this script. The output was copied and pasted into a document named employee.xml, and saved to the same folder as employee.pdf.

You can attach employee.xml using the UI, but the script for doing so is as follows:

var thisPath = this.path.replace(/.pdf$/, ".xml");
try { this.importDataObject("employees", thisPath); }
    catch(e) { console.println(e) };

Of the three form fields in the document employee.pdf, only employee.id has any script. The following is a custom keystroke script:

if (event.willCommit) {
   try {
   // Get the data contents of the "employees" attachment
   var oDB = this.getDataObjectContents("employees");
   // Convert to a string
   var cDB = util.stringFromStream(oDB);
   // Use the eval method to evaluate the string, you get an XML variable
   var employees = eval(cDB);
       // Retrieve record with the id input in the employee.id field
       var record = employees.employee.(@id == event.value);
       // If the record is an empty string, or there was nothing entered...
       if ( event.value != "" && record.toString() == "" ) {
           app.alert("Record not found");
           event.rc = false;
       }
       // Populate the two other fields
       this.getField("employee.name.first").value = record.name.first;
       this.getField("employee.name.last").value = record.name.last;
       } catch(e) {
           app.alert("The DB is not attached to this document!");
           event.rc = false;
       }
}

Cropping and rotating pages

In this section we discuss the JavaScript API for cropping and rotating a page.

Cropping pages

The Doc object provides methods for setting and retrieving the page layout dimensions. These are the setPageBoxes and getPageBox methods. There are five types of boxes available:

  • Art

  • Bleed

  • Crop

  • Media

  • Trim

See the PDF Reference for a discussion of these types of boxes.

The setPageBoxes method accepts the following parameters:

  • cBox: the type of box

  • nStart: the zero-based index of the beginning page

  • nEnd: the zero-based index of the last page

  • rBox: the rectangle in rotated user space

For example, the following code crops pages 2-5 of the document to a 400 by 500 pixel area:

this.setPageBoxes({
    cBox: "Crop",
    nStart: 2,
    nEnd: 5,
    rBox: [100,100,500,600]
});

The getPageBox method accepts the following parameters:

  • cBox: the type of box

  • nPage: the zero-based index of the page

For example, the following code retrieves the crop box for page 3:

var rect = this.getPageBox("Crop", 3);

Rotating pages

You can use JavaScript to rotate pages in 90-degree increments in the clockwise direction relative to the normal position. This means that if you specify a 90-degree rotation, no matter what the current orientation is, the upper portion of the page is placed on the right side of your screen.

The Doc object’s setPageRotations and getPageRotation methods are used to set and retrieve page rotations.

The setPageRotations method accepts three parameters:

  • nStart: the zero-based index of the beginning page

  • nEnd: the zero-based index of the last page

  • nRotate: 0, 90, 180, or 270 are the possible values for the clockwise rotation

In the following example, pages 2 and 5 are rotated 90 degrees in the clockwise direction:

this.setPageRotations(2,5,90);

To retrieve the rotation for a given page, invoke the Doc object getPageRotation method, which requires only the page number as a parameter. The following code retrieves and displays the rotation in degrees for page 3 of the document:

var rotation = this.getPageRotation(3);
console.println("Page 3 is rotated " + rotation + " degrees.");

Extracting, moving, deleting, replacing, and copying pages

The Doc object, in combination with the app object, can be used to extract pages from one document and place them in another, and moving or copying pages within or between documents.

The app object an be used to create or open any document. To create a new document, invoke its newDoc method, and to open an existing document, invoke its openDoc method.

The Doc object offers three useful methods for handling pages:

  • insertPages: Inserts pages from the source document into the current document

  • deletePages: Deletes pages from the document

  • replacePages: Replaces pages in the current document with pages from the source document.

These methods enable you to customize the page content within and between documents.

Suppose you would like to remove pages within a document. Invoke the Doc object’s deletePages method, which accepts two parameters:

  • nStart: the zero-based index of the beginning page

  • nEnd: the zero-based index of the last page

For example, the following code deletes pages 2 through 5 of the current document:

this.deletePages({nStart: 2, nEnd: 5});

Suppose you would like to copy pages from one document to another. Invoke the Doc object insertPages method, which accepts four parameters:

  • nPage: the zero-based index of the page after which to insert the new pages

  • cPath: the device-independent path of the source file

  • nStart: the zero-based index of the beginning page

  • nEnd: the zero-based index of the last page

For example, the following code inserts pages 2 through 5 from mySource.pdf at the beginning of the current document:

this.insertPages({
    nPage: -1,
    cPath: "/C/temp/mySource.pdf",
    nStart: 2,
    nEnd: 5
});

You can combine these operations to extract pages from one document and move them to another (they will be deleted from the first document). The following code will extract pages 2 through 5 in mySource.pdf and move them into myTarget.pdf :

// The operator, this, represents myTarget.pdf
// First copy the pages from the source to the target document
this.insertPages({
    nPage: -1,
    cPath: "/C/temp/mySource.pdf",
    nStart: 2,
    nEnd: 5
});

// Now delete the pages from the source document
var source = app.openDoc({cPath:"/C/temp/mySource.pdf"});
source.deletePages({nStart: 2, nEnd: 5});

To replace pages in one document with pages from another document, invoke the target document’s replacePages method, which accepts four parameters:

  • nPage: The zero-based index of the page at which to start replacing pages

  • cPath: The device-independent pathname of the source file

  • nStart: The zero-based index of the beginning page

  • nEnd: The zero-based index of the last page

In the following example, pages 2 through 5 from mySource.pdf replace pages 30 through 33 of myTarget.pdf :

// This represents myTarget.pdf
this.replacePages({
    nPage: 30,
    cPath: "/C/temp/mySource.pdf",
    nStart: 2,
    nEnd: 5
});

To safely move pages within the same document, it is advisable to perform the following sequence:

  1. Copy the source pages to a temporary file.

  2. Insert the pages in the temporary file at the new location in the original document.

  3. Delete the source pages from the original document.

The following example moves pages 2 through 5 to follow page 30 in the document:

// First create the temporary document:
var tempDoc = app.newDoc("/C/temp/temp.pdf");

// Copy pages 2 to 5 into the temporary file
tempDoc.insertPages({
    cPath: "/C/temp/mySource.pdf",
    nStart: 2,
    nEnd: 5
});

// Copy all of the temporary file pages back into the original:
this.insertPages({
    nPage: 30,
    cPath: "/C/temp/temp.pdf"
});

// Now delete pages 2 to 5 from the source document
this.deletePages({nStart: 2, nEnd: 5});

Adding watermarks and backgrounds

The Doc object addWatermarkFromText and addWatermarkFromFile methods create watermarks within a document, and place them in optional content groups (OCGs).

The addWatermarkFromFile method adds a page as a watermark to the specified pages in the document. The example below adds the first page of watermark.pdf as a watermark to the center of all pages within the current document:

this.addWatermarkFromFile("/C/temp/watermark.pdf");

In the next example, the addWatermarkFromFile method is used to add the second page of watermark.pdf as a watermark to the first 10 pages of the current document. It is rotated counterclockwise by 45 degrees, and positioned one inch down and two inches over from the top left corner of each page:

this.addWatermarkFromFile({
    cDIPath: "/C/temp/watermark.pdf",
    nSourcePage: 1,
    nEnd: 9,
    nHorizAlign: 0,
    nVertAlign: 0,
    nHorizValue: 144,
    nVertValue: -72,
    nRotation: 45
});

It is also possible to use the addWatermarkFromText method to create watermarks. In this next example, the word Confidential is placed in the center of all the pages of the document, and its font helps it stand out:

this.addWatermarkFromText(
    "Confidential",
    0,
    font.Helv,
    24,
    color.red
);

Converting PDF documents to XML format

Since XML is often the basis for information exchange within web services and enterprise infrastructures, it may often be useful to convert your PDF documents into XML format.

It is a straightforward process to do this using the Doc object saveAs method, which not only performs the conversion to XML, but also to a number of other formats.

In order to convert your PDF document to a given format, you will need to determine the device-independent path to which you will save your file, and the conversion ID used to save in the desired format. A list of conversion IDs for all formats is provided in the Acrobat JavaScript API Reference. For XML, the conversion ID is com.adobe.acrobat.xml-1-00.

The following code converts the current PDF file to XML and saves it at C:temptest.xml :

this.saveAs("/c/temp/test.xml", "com.adobe.acrobat.xml-1-00");