Plant Bioinformatics Group

Data format/Loading data

Strudel can visualize data from potentially any number of genomes but this is obviously limited by the amount of available screen space. Including a large number of genomes may also adversely affect performance.

Currently users can provide their own data either in tab delimited text files in the proprietary Strudel format, or in Multiple Alignment Format (MAF) files (see section below on MAF).

Strudel data format:

A simple example is available here.

A feature entry contains the following columns:

"feature" label -- genome name -- chromosome name -- feature name -- feature type(e.g. SNP) -- feature start position -- feature stop position -- annotation

The feature stop and annotation fields can be blank.

A homolog entry contains the following columns:

"homolog" label -- genome name for feat.1 -- feat.1 name -- genome name for feat.2 -- feat.2 name -- BLAST eValue -- annotation -- colour

The annotation and colour fields can be blank. A homolog's colour must be defined using hexadecimal format, e.g. #0000FF for blue.

Optionally, users can also specify a reference URL for each of the genomes loaded. This URL must be in a format that allows the application to append a feature's name, e.g. "http://myannotationsite.org/search?featurename=". Users can then click on a feature name in the results table and this will start up the default web browser and open a page with annotation for the feature in question.

A URL entry contains the following fields:

"URL" label -- genome name -- the URL itself

Optionally, users can also specify annotation colours for individual chromosomes in the data file. A chromosome entry contains the following columns:

"chromosome" label -- genome name -- chromosome name -- colour in hexadecimal format, e.g. #0000FF for blue.

This will not change the colour of the chromosome itself, however, as this is used for other purposes, e.g. highlighting inversion etc. Instead, a small coloured rectangle will be displayed next to the chromosome.

Back to top

MAF data format:

Multiple Alignment Format (MAF) is normally a format used to align sequences between taxa, but in this context we use it to display synteny between features. MAF supports multiple blocks of alignments in the same file, and each of these blocks represents a feature for the purpose of Strudel. The taxa contributing to the alignments in the file become the set of genomes displayed, and where multiple taxa are involved in an alignment the start position of the feature is added as a feature to each genome, and homologies are then established between the features involved.

Part of the MAF naming convention is to -- optionally -- separate the organism name from the chromosome name with a full stop ("."). Where this is encountered in MAF files, Strudel will parse the source name into these two components and instantiate separate chromosomes within a genome. Where this convention is not followed, Strudel will assume that all features are located on a single map for this genome.

For full details on the MAF format please refer to the MAF online documentation. A simple MAF example file is available here.

Back to top

Ordering of genomes:

By default, Strudel will layout the genomes on canvas from left to right to right according to the order in which it encounters new genomes in the features section of the input data. In order to avoid lines being drawn across genomes, Strudel will only display homologies between adjacent genomes.

Users can configure the ordering and number of instances of genomes on screen by clicking the "Configure datasets" button on the toolbar. This brings up the following dialog:

Config Genomes Dialog

You can choose the order of the genomes on screen (left to right) by selecting genome names from the drop-down menus in top to bottom order. Clicking the "Add" and "Remove" buttons will add and remove a drop-down menu at a time, respectively. Click the "OK" button to confirm the new layout. This will reset the view in the process, and result sets generated previously will be cleared.

You can use this feature to add additional instances of a genome to allow all-by-all comparisons without links being drawn across genomes.

Back to top

Opening files:

Click the data load button on the toolbar.

In the Load Data dialog shown below, click on "Load own data files" and then browse for the appropriate files as required.

Load Data Dialog

You can also drag and drop files onto the Strudel canvas to open them.

Back to top

Example data:

An example data set is also provided with Strudel. This can be loaded by clicking the "Load Data" button on the toolbar, and then accepting the default option on the Load Data dialog. Currently this consists of a comparison of the genomes of Brachypodium distachyon, barley ( Hordeum vulgare) and rice ( Oryza sativa). Click the info button on the toolbar, then the "Example Data Sets" tab to see more detail.

Back to top