Data format/Loading data
Strudel can visualize data from potentially any number of genomes but this is obviously limited by the amount of available screen space. Including a large number of genomes may also adversely affect performance.
Currently users can provide their own data either in tab delimited text files in the proprietary Strudel format, or in Multiple Alignment Format (MAF) files (see section below on MAF).
Strudel data format:
- The Strudel data format is tab delimited text with all features, homologs and potential reference URLs included in the same file.
- Each feature, homolog or URL is represented by a single line entry.
- The file must contain a block with all the features before any homologs.
- Headers are not necessary but can be included if desired (these will be ignored when the file is read).
A feature entry contains the following columns:
"feature" label -- genome name -- chromosome name -- feature name -- feature type(e.g. SNP) -- feature start position -- feature stop position -- annotation
The feature stop and annotation fields can be blank.
A homolog entry contains the following columns:
"homolog" label -- genome name for feat.1 -- feat.1 name -- genome name for feat.2 -- feat.2 name -- BLAST eValue -- annotation -- colour
The annotation and colour fields can be blank. A homolog's colour must be defined using hexadecimal format, e.g. #0000FF for blue.
Optionally, users can also specify a reference URL for each of the genomes loaded. This URL must be in a format that allows the application to append a feature's name, e.g. "http://myannotationsite.org/search?featurename=". Users can then click on a feature name in the results table and this will start up the default web browser and open a page with annotation for the feature in question.
A URL entry contains the following fields:
"URL" label -- genome name -- the URL itself
Optionally, users can also specify annotation colours for individual chromosomes in the data file. A chromosome entry contains the following columns:
"chromosome" label -- genome name -- chromosome name -- colour in hexadecimal format, e.g. #0000FF for blue.This will not change the colour of the chromosome itself, however, as this is used for other purposes, e.g. highlighting inversion etc. Instead, a small coloured rectangle will be displayed next to the chromosome.
Back to topMAF data format:
Multiple Alignment Format (MAF) is normally a format used to align sequences between taxa, but in this context we use it to display synteny between features. MAF supports multiple blocks of alignments in the same file, and each of these blocks represents a feature for the purpose of Strudel. The taxa contributing to the alignments in the file become the set of genomes displayed, and where multiple taxa are involved in an alignment the start position of the feature is added as a feature to each genome, and homologies are then established between the features involved.
Part of the MAF naming convention is to -- optionally -- separate the organism name from the chromosome name with a full stop ("."). Where this is encountered in MAF files, Strudel will parse the source name into these two components and instantiate separate chromosomes within a genome. Where this convention is not followed, Strudel will assume that all features are located on a single map for this genome.
For full details on the MAF format please refer to the MAF online documentation. A simple MAF example file is available here.
Back to topOrdering of genomes:
By default, Strudel will layout the genomes on canvas from left to right to right according to the order in which it encounters new genomes in the features section of the input data. In order to avoid lines being drawn across genomes, Strudel will only display homologies between adjacent genomes.
Users can configure the ordering and number of instances of genomes on screen by clicking the "Configure datasets"
button
on the toolbar. This brings up the following dialog:
You can choose the order of the genomes on screen (left to right) by selecting genome names from the drop-down menus in top to bottom order. Clicking the "Add" and "Remove" buttons will add and remove a drop-down menu at a time, respectively. Click the "OK" button to confirm the new layout. This will reset the view in the process, and result sets generated previously will be cleared.
You can use this feature to add additional instances of a genome to allow all-by-all comparisons without links being drawn across genomes.
Back to topOpening files:
Click the data load
button on the toolbar.
In the Load Data dialog shown below, click on "Load own data files" and then browse for the appropriate files as required.
You can also drag and drop files onto the Strudel canvas to open them.
Back to topExample data:
An example data set is also provided with Strudel. This can be loaded by clicking the "Load Data" button on the toolbar, and then accepting the default option on the Load Data dialog. Currently this consists of a comparison of the genomes of Brachypodium distachyon, barley ( Hordeum vulgare) and rice ( Oryza sativa). Click the info button
on the toolbar, then the "Example Data Sets" tab to see more detail.