Data Format

Data format

The software is designed to visualize moving and transforming data, using a text file that follows a specific format (similar to csv, or comma separated values) files. Because the software supports multiple tables and metadata within the same file, a few rules must be followed in order for the software to function properly.

Concept

The visualization is meant to show interactions between data (represented as nodes/layers) over time. The visualization thus consists of interactions that are represented as data points. As stated earlier, please note that the content of these interactions and data points can come from different data tables.

Accordingly, for the software to work we need:

  1. Nodes
  2. Data tables
  3. A way of relating nodes through data points at a specific point in time

With this concept in mind, we can dive into the implementation and the rules we should observe to use the software effectively.

Metadata

Suffix: ##META

To fully utilize the capabilities of the visualization software, it may be useful to store certain attributes within the file (such as author, contact info, device model, etc.). This extra information is displayed in the "info" section of the visualization, which can be accessed through the "i" icon. These attributes can be included in the following format:

##META [title], [value]

Example:

##META Data collected by, John Doe
##META Device, Google Pixel 4

Comments

Suffix: #

In certain situations, it may be helpful to have comments in the file that are meant to be ignored by the visualization. This can be done by adding a single # in front of the comment.

# [some comment]

Example:

# Ignore me

Tables

Suffix: ##TABLE

Tables can be defined by giving them a name and defining the column structure. Three columns are required in all tables to allow the visualization to show relations between layers over time: _datetime, _from, and _to.

_datetime refers to the moment in time at which the recorded event occurred. It must include the date, time, and time zone in ISO format.

_from is the name of the node/layer from which the interaction started.

_to is the name of the node/layer to which the interaction was directed.

##TABLE [name], [_datetime, _from, _to, column a, column b, ...]

Example:

##TABLE Location Measurement, _datetime, _from, _to, Latitude, Longitude

Data rows

Data rows populate pre-defined tables. No special suffix is required. The first value will always be the name of the table that it corresponds to. The values must follow the same order as the columns in the table definition.

[table name], [value a, value b, ...]

Example:

Location Measurement, 1994-11-05T08:15:30-05:00, Location API, Geocoder API, 43.642567, -79.387054

Example

Combining the rules explained above, a data file could look like this:

##META Recorded by, John Doe
##META Device, Google Pixel 4
##TABLE Location Measurement, _datetime, _from, _to, Latitude, Longitude
##TABLE Geocoder Result, _datetime, _from, _to, Point of Interest
Location Measurement, 1994-11-05T08:15:30-05:00, Location API, Geocoder API, 43.642567, -79.387054
Location Measurement, 1994-11-05T08:15:40-05:00, Location API, Geocoder API, 43.642567, -79.387054
Location Measurement, 1994-11-05T08:15:50-05:00, Location API, Geocoder API, 43.642567, -79.387054
Geocoder Result, 1994-11-05T08:15:55-05:00, Geocoder API, Application Display, CN Tower
Location Measurement, 1994-11-05T08:16:00-05:00, Location API, Geocoder API, 43.642567, -79.387054
Location Measurement, 1994-11-05T08:16:10-05:00, Location API, Geocoder API, 43.642567, -79.387054
Geocoder Result, 1994-11-05T08:16:15-05:00, Geocoder API, Application Display, CN Tower
Geocoder Result, 1994-11-05T08:16:20-05:00, Geocoder API, Application Display, CN Tower
Geocoder Result, 1994-11-05T08:16:25-05:00, Geocoder API, Application Display, CN Tower
Geocoder Result, 1994-11-05T08:16:30-05:00, Geocoder API, Application Display, CN Tower

Considerations

Values with commas

Depending on the data being visualized, commas might be part of the values. In a csv file, commas have a syntactic meaning, which means that they can't be used within values, like in the following example:

##TABLE Text Message, _datetime, _from, _to, Message
Text Message, 1994-11-05T08:16:30-05:00, John, Mary, Hello, how are you doing?

Where Hello, how are you doing? is meant to be a single value. This can be solved by surrounding the value in double quotes:

Text Message, 1994-11-05T08:16:30-05:00, John, Mary, "Hello, how are you doing?"

Now, Hello, how are you doing? will be interpreted as a single value.