HTML Data Set

Spry HTML Data Set

The Spry HTML data set allows users to use standard HTML tables and other structured markup as a data source. HTML data sets work the same way as XML data sets, except that can leverage the millions of tables that already exist! Since we flatten XML into a table structure, the HTML data set was a natural extension of the Spry Framework.

Optional parameters used with XMLDataSet to specify sorting, caching, distinct and loadInterval work in a similar way with the HTMLDataSet.

We designed the Spry framework in such a way that the data acquisition methods are independent of the <body> 'region' ideas. This means that the Spry region attributes don't care where the data comes from, be it XML, JSON or HTML. The universal familiarity with HTML tables makes the learning curve of developing a Spry data source very easy.

Tables as data sources

The HTML data set allows standard HTML tables to be used as data sources. Setting up a HTML table is easy:

first name	last name	username	phone
Kirk	Davis	kdavis	(415) 333-4334
James	Miller	jmiller	(415) 333-7566
Alex	Wilson	awilson	(415) 333-9843
Albert	Moore	amoore	(415) 333-7584

The table must have an ID attribute. Spry uses this to ID the data source.

To use this table a Spry data source, you must first attach the dependent js files. You will need:

<script src="SpryData.js" type="text/javascript"></script>
<script src="SpryHTMLDataSet.js" type="text/javascript"></script>

Note: There is no need for xpath.js (used with XML data sets).

Below the script links, build a data set constructor.

The values are:

var yourDataSetName = Spry.Data.HTMLDataSet("path to file with table", "id of table");

<script>
 var ds1 = new Spry.Data.HTMLDataSet("products.html", "productsTableID");
</script>

This is the minimum information needed to create a data set. We will discus options below.

Keep in mind that 'Spry.Data.HTMLDataSet' is case sensitive.

The value of a piece of data from a HTML table is the contents of the <td>. This includes text AND markup.

Defining Headers

"Headers" in this sense is the row which defines the column names.

By default, Spry assumes that the first row of the table are column names. These column names will be used as data reference names within Spry regions. If the first row of the table is actually data, this can be specified in the constructor. This can be changed with the 'firstRowAsHeader' option.

var ds2 = new Spry.Data.HTMLDataSet("products.html", "productsTable", {firstRowAsHeaders:false});

The 'firstRowAsHeaders' option can only be used with a HTML Table.

When firstRowAsHeader is set to 'false', the values can be accessed using a markup like {column#}: {column0} for the first column of data in the source table, {column1} for the second...

See the Custom Columns section for more details on column names.

Row#	1^st column from table	3^rd column from table
{ds_RowNumberPlus1}	{column0}	{column2}

Note: The values in the first row of data that are used as headers should not contain spaces or other tags. The tags are removed and the spaces are replaced with underscore (_).
E.g.: The values under the header <strong>product features</strong> will be accessed inside a spry:region with this markup: {product_features}.

Column as Header

There is also an option for using the first column (as opposed to the first row) as column names. In a situation like:

January	35	74	34
February	23	57	44
March	28	37	55

We want to set the months as the column names. Set the {useColumnsAsRows:true} option.

var ds2 = new Spry.Data.HTMLDataSet(null, "monthtable",{useColumnsAsRows:true});

Internal Table

It is possible to use a table within the actual Spry page. If this is the case, set the path to the page to 'null'

<script>
 var ds1 = new Spry.Data.HTMLDataSet(null, "productsTable");
</script>

By default, when an internal table is used, Spry will hide the source element when the page loads. This can be changed by setting the 'hideDataSourceElement' to false.

<script>
 var ds1 = new Spry.Data.HTMLDataSet(null, "productsTable",{hideDataSourceElement:false});
</script>

ColSpan and RowSpan

We recommend using straight tables as data sources. This means that there are no rowspans or colspans in the table.

Colspan and rowspan are used to merge cells together. In this scenario, the number of rows will not equal the number of columns, leading to data inconsistency.

For instance, if the HTML table looks like:

A	B	C	D
A1	B1	C1	D1
A2	B/C2		D2

Spry has to make some assumptions as to what the values of B2 and C2 are. Since we can't determine to what column it belongs, we can't say that one of the values is null. Therefore, we have to assume that they are the same. So Spry sees the above table as:

A	B	C	D
A1	B1	C1	D1
A2	B2	B2	D2

We assume the value is from the leftmost (or topmost) column. This is why it is B2 instead of C2. The first row of data (used as column names) cannot use colspans.

Markup Structure as Data

If you are familiar with Spry widgets, you may know that we say: "We don't care about the markup, just the structure." This remains true for HTML data sets.

A table structure is:

<table id="mydata">
- <tr>
  - <td>
  - <td>

This is a three level nesting with the third level repeating. So, as long as this structure is respected, any HTML markup can be used as a data source.

<div id="mydata">
- <ul>
  - <li>
  - <li>

<div id="mydata">
- <div>
  - <span>
  - <span>

Using CSS to collect data

The HTML data set is quite flexible in that you can use CSS to grab parts of the page and use them as data. This can be as simple as "Get all <tr>s with the class "employee" attached to "return the contents of the element with the '#navbar a.external' selector.

To enable this flexibility, we introduce the the idea of:

'rowSelector' which specifies the element that contains the data elements. This is equivalent to the <TR> tag of a table data source.
dataSelector', which specifies the elements that contain the actual data, or the <TD> in a table data source.

These are used as options in the HTML data set constructor.

For the markup below, the OL tags will be the 'rowSelector' and the LI tags will contain the data, so they are the 'dataSelector'.

<div id="mydata">
  <ul>
    <li> 
    <li>
  </ul>
  <ul>
    <li>
    <li>
  </ul>
</div>

The constructor will be:

var ds1 = new  Spry.Data.HTMLDataSet(null, "mydata", {rowSelector:"ul", dataSelector="li"});

So what values can be used with rowSelector/dataSelector? We can leverage most levels of CSS. We can select data via:

tag name: "DIV", "SPAN", "*"
class name: ".rowData", ".even"
id: "#product1"
tag name with class name: "DIV.rowData", "*.header"
tag name with id: "DIV#product1"
child selector (only first-level descendents): ">"
child selector combined with any of the above: "> DIV.rowData"
comma delimited list of selectors (the node will be selected if it matches at least one from the list): "#product1, #product2, #product3"

Picture a table with a class of "contractor" on some, but not all, rows.

	Name	email
<tr class="contractor">	Don	don@abc.com
	Kin	kin@abc.om
<tr class="contractor">	James	james@abc.com

We can set up a constructor that says:

var ds1= new Spry.Data.HTMLDataSet("mypage.htm", "thetableID",{rowSelector:".contractor"});

This will only pull in the two rows with the 'contractor' class attached.

Generic HTML structures

You can also break away from the table row/column paradigm and use the HTML data set to pull in markup from any element on the page.

You can control how the data is returned by selectively using rowSelector and dataSelector. Notice the sample above only uses the rowSelector.

No rowSelector, no dataSelector

The whole markup structure is mapped to the data set in one cell : It is a one row, one column data set.

Source: <div id="someData">This is the source used</div>
Instantiation: var ds1 = new Spry.Data.HTMLDataSet(null, "someData");
Spry markup: <div spry:region="ds1">Here's the extracted value: {column0}</div>

Only rowSelector specified

HTMLDataSet uses this rowSelector to extract the rows of data, but uses a single column to map the whole content of each row.

If you have:

<ul id="myList">
   <li>one</li>
   <li>two</li>
   <li>three</li>
 </ul>

The constructor is:

var ds1 = new Spry.Data.HTMLDataSet(null, "myList", { firstRowAsHeaders:false, rowSelector: "li"});

Spry markup: <div spry:region="ds1"><span spry:repeat="ds1">{column0}</span></div>

Only dataSelector specified

This tells HTMLDataSet that you're only interested in extracted some data pieces from the whole container, but you're not interested in using multiple rows. This is a single row of data, but different columns inside this row of data.

Source:

<div  id="myContainer">Some data here
   <div id="first">This is the  first chunk I'm interested in</div>
   Some other data <span>goes  here</span>
   <div id="second">the second  chunk</div>
   More uninteresting data..
    </div>

The constructors can be either:

var ds1 = new Spry.Data.HTMLDataSet(null, "myContainer", {dataSelector: "div"}); 
OR
var ds1 = new Spry.Data.HTMLDataSet(null, "myContainer", {dataSelector: "#first, #second"});

Spry markup: <div spry:region="ds1">This is the first value {@first} and this is the second {@second}</div>

Notice that the data reference used is {@idelement}. This happens ONLY in the case of single-row data. Instead of using {column0}, {column1} the ID on the chunks of extracted data are used as column names. You can overwrite this behavior by passing {IDAsHeadersForSingleRow:false} and use the markup {column0},{column1} in the spry:region.

Custom Column Names

When using a table structure, it is common to have the first row of the table be headers that name the columns below. By default Spry uses the first column as columns names and from this we derive the data reference names: {email}, {firstname}. We have an option to tell Spry whether the first row is data or a header. In generic HTML structures, the first repeating element may or may not be column names.

To set custom column names use the 'columnNames' option in the constructor.

var ds1 = new Spry.Data.HTMLDataSet(null, "myContainer", {dataSelector: "#first, #second", columnNames:['firstcolumnname','secondcolumnname']});

When using the columnNames option, the number of custom names in the array must equal the number of columns in the data set.

Now, in the Spry regions, use these column names as the data references.

Note: When using custom column names in a HTML table, AND the 'firstRowAsHeader' is true (which is the default), the first row will not be used in the data set. The custom column names will then be used as data references.

HTML Data Set functions

The HTML data set has a set of functions you can use to manipulate the data set.

getURL() - This returns the value of the current URL being used in the data set constructor.
- ds1.getURL();
setURL("theURL") - This sets the path to the new file to be used in the data set.
- ds1.serURL("mydwnewfile.htm");
getSourceElementID() - Returns the ID of the page element being used for the data set.
- ds1.getSourceElementID();
setSourceElementID("theSourceID") - This is used to set or change the ID of the page element used as the data source.
- ds1.setSourceElementID("thenewID");
getRowSelector() - Returns the RowSeelctor being used.
- ds1.getRowSelector();
setRowSelector("theRowSelector") - Sets a new RowSeelctor for the data set.
- setRowSelector("myID");
getDataSelector() - Returns the DataSeelctor being used.
- ds1.getDataSelector();
setDataSelector("theDataSelector") - Sets a new DataSeelctor for the data set.
- setDataSelector("myID");