![]() |
|
2 Selecting the Database to Search
3 Searching Pathway/Genome Databases
3.1 Quick Search
3.2 Search Menu: Object Searches
3.3 Search Menu → Compounds
3.4 Search Menu → Genes/Proteins/RNAs
3.5 Search Menu → Reactions
3.6 Search Menu → Pathways
3.7 Search Menu → Growth Media
3.8 Search Menu → Advanced Search
3.9 Ontology Searches
3.10 Search Menu → Google This Site
3.11 Search Menu → BLAST
3.12 Search Menu → Search Full-text Articles
4 Web Groups
4.1 Group Structure and Display
4.2 Group Directory
4.3 Creating a Group
4.3.1 From a Search
4.3.2 From Scratch
4.3.3 Via Import
4.3.4 Via Enrichment
4.3.5 From an Existing Group
4.4 Manipulating Group Contents
4.4.1 Adding a Column
4.4.2 Editing a Column
4.4.3 Adding a Row
4.4.4 Deleting Rows
4.4.5 Sorting
4.4.6 Filtering
4.4.7 Set Type
4.4.8 Moving and Deleting Columns
4.5 Exporting and Sharing a Group
4.5.1 Export to a File
4.5.2 Export to the Cellular Overview
4.5.3 Sharing a Group
4.6 User Pages and Directory
5 Cellular Overview
5.1 Organization of the Cellular Overview
5.2 Getting Started
5.3 Summary of the Mouse Commands
5.4 Summary of the General Commands
5.5 Organism Selection
5.6 Searching and Highlighting
5.7 Omics Viewer (Overlay Experimental Data)
5.8 Getting Started
5.9 Omics Dataset File Format
5.10 Examples
5.11 Color Scale
5.12 Omics Viewer Results
5.13 Submitting Highlight Operations via a HTTP Get (URL)
5.14 Submitting Expression Data Via an HTTP GET.
5.15 Submitting Expression Data Via an HTTP POST
5.16 Submitting Expression Data Via an HTTP GET or POST for
Display on Individual Pathways
6 Regulatory Overview
6.1 Summary of the Mouse Commands
6.2 Organism Selection
6.3 Layout Selection
6.4 Highlighting Genes and Regulatory Relationship Arrows
6.5 Redisplay Highlighted Genes Only
6.6 Omics Viewer for Regulatory Overview
10 Web Services
10.1 BioPAX XML Services
10.2 Pathway Tools XML Services
10.2.1 Retrieving a Single Object
10.2.2 Retrieving a Set of Objects Using the Pathway Tools API
Functions
10.2.3 Retrieving a Set of Objects Returned by a BioVelo Query
This document describes how to use Web sites based on the Pathway Tools software from SRI International. Since multiple Web sites such as BioCyc, YeastCyc, AraCyc, and MouseCyc are all based on the same underlying software, the same usage instructions apply to all. (Note that differences in configuration and in software version may introduce some variability among sites).
Unless otherwise indicated, all Pathway/Genome Database searches are restricted to a single database. In most cases, a database describes a single organism – although a small number of multi-organism Pathway/Genome Databases exist (examples include MetaCyc and PlantCyc). The database against which searches will be conducted is indicated below the Quick Search box in the page banner, and at the bottom of the Search and Tools pull-down menus.
To search a different database, click on the ‘change’ link (found below the Quick Search box, and at the bottom of the Search and Tools menus). In the dialog that pops up, you can either search for the organism of interest in the scrollable list, you can start typing in its name, or you can browse the organism taxonomy.
By default, the By Name tab will be initially selected. When a large number of databases is available, the alphabetical index to the left of the database list provides a convenient shortcut for scrolling to a desired part of the alphabet. If you start typing an organism name, the full list of databases will be replaced by a list of databases matching the string you typed— you can use the mouse or the up/down arrows on your keyboard to select the desired database. Lists of your recently used databases and the site’s most popular databases provide shortcuts for selecting those databases.
The By Taxonomy tab allows you to select an organism by browsing for it. After the name of each class of organisms is listed the number of organism databases in that class. The taxonomy tree does not include all taxonomy classes, only those that contain more than one organism database – if a particular taxon does not appear in the tree, it means there is no database available for it or its children. Clicking on a class name will show or hide its list of child taxa. Clicking on an organism name will select that database and show its name in the text box at the top.
If the site supports user accounts, and you are logged in, you may select one database as your preferred database. This database will be your default selection when starting a new web session.
Once you have selected the desired database, click OK to exit the dialog. By default, this will navigate to the page of summary statistics for the selected database. However, you can avoid that behavior by unchecking the box reading Go to Organism Summary page for selected database. In that case, the current page will reload, and the text under the Quick Search box should now indicate the newly selected database. Note that if you are looking at a page that contains data from a particular organism, selecting a new database will not affect the contents of the current page – the new selection will apply only to your future searches.
The Quick Search box in the upper right hand corner of every page is useful if you know the name (or part of the name) or database identifier of the object you are searching for. You may use this box to search for genes, proteins, compounds, RNAs, reactions, pathways, operons, and GO terms. If the query string matches a single object, the page for that object will be displayed immediately. If there are multiple matches, the full list of matches will be shown, organized by the type of object (e.g. gene, protein, etc.).
Some examples of what can be entered into the Quick Search box include:
The name of a compound, gene, protein, pathway or other object. Spaces, punctuation and capitalization are ignored. An object will be returned if the query string matches either its common name or one of its synonyms.
Examples: pyruvate, trpA
Any substring of one of the above names that is 3 or more characters in length.
Examples: kinase, pyr
An EC number (full or partial).
Examples: 1.2.3.3, 1.3.99
A PGDB internal object identifier for any compound, gene, protein, pathway, reaction, transcription-unit or schema class. Correct capitalization may be required.
Examples: CPLX0‑3661, HEMN‑RXN
An identifier from some external database to which we maintain links, e.g., a UniProt identifier. Correct capitalization and punctuation is required. Note that our set of links is not complete – just because a search for an external ID returns no result does not mean that we do not have the object in our database.
Examples: P00561, NP_414543, C00047
A few additional rules govern searches:
To match several words or text-fragments simultaneously, type in the words separated by spaces to find an object with all the words in its name, or separated by commas to find objects with any of the words in its name. For example, if you enter nitrate camphor, the program will search for a single object that has both nitrate and camphor in its name. However, entering nitrate, camphor would result in a search for objects which have either nitrate or camphor in their names.
If your query text is one or two characters in length, only exact text matches will be returned because of the many matches that would otherwise result. For longer text fragments, the search will return all objects that contain the text rather than match it exactly.
Searches may be qualified. Currently we allow two qualifiers:
search:exact Example: trpa search:exact
This search will be limited to exact matches. In the example given,
assuming the current organism is E. Coli K-12,
without the qualifier there will be several matches including genes,
proteins and transcription units. With the qualifier you will be taken
directly to the trpa gene page.
type:<type-qualifier> Example: atp type:compound
This search will be limited to the specified type. In the example given,
assuming the current organism is E. Coli K-12,
without the qualifier a large number of results will be returned of various
types. With the qualifier, just the seven compounds with ATP in the name
will be returned.
Allowable type-qualifiers include pathway, gene, enzyme, rna,
go-terms, compound, reaction, operon, and organism.
The Search menu contains links to specialized search pages for Compounds, Genes/Proteins/RNAs, Reactions and Pathways. Each such page contains options for searching using a number of different criteria, either individually or in combination. When the page is initially loaded, only the name searches are active, but by clicking on the different search bars, you can enable or disable additional search criteria. If multiple search criteria are specified for a given search, then unless otherwise specified the results must satisfy all of them (that is, an AND connector is used to combine the different criteria).
The results of all object searches is a table containing the names of all objects that satisfy the search, with hyperlinks to their corresponding data pages, along with any additional columns relevant to the particular search. The table will initially be sorted alphabetically by name, but small triangles in the column headers allow the user to sort by any column, in either ascending or descending order.
The sections below describe the different search criteria that are available for each object type.
Search for compound by name or ID
Enter a compound
name, name fragment, or identifier (either the internal Pathway/Genome Database
identifier, or an identifier from some other database such as PubChem
or LIGAND). The software will attempt to do auto-completion on the
string you have entered based on the contents of the database. If you
select one of the auto-complete options, then when you submit the form
you will be taken directly to the data page for the selected compound,
regardless of other search criteria you may have
specified (i.e., other search criteria will be ignored). If you do not
select one of the auto-complete options, then the string you typed
will be the target of a substring search, which may be combined with
other search criteria.
Search/Filter by ontology
This option allows you to browse the compound ontology. Each compound
class includes in parentheses after its name the number of
instance-level compound objects that are members of that class.
Clicking a + icon shows the classes and compounds that belong
to a particular class. The ontology may be used in one of two ways.
By selectively clicking on + icons, you can browse to find a
compound or compound class of interest, and click directly on its name
to visit the data page for that compound. Alternatively, you can
check the checkbox next to one or more class names to limit your
search (which may also include other search criteria) so as
to only include compounds that belong to one of the checked classes.
Search/Filter by molecular weight
This option can be used to specify either a minimum molecular weight
value, a maximum molecular weight value, or both. If either the
minimum or maximum field is left blank, then the molecular weight is
unconstrained in that direction.
Search/Filter by chemical formula (partial or full)
If
one or more element symbols are entered without a number, then the
result will include any compound containing those elements (and
possibly some others). If an element symbol is followed by a number,
then only compounds with exactly that number of that element in its
chemical formula will be included in the result. For example, the
query string C12N will retrieve all compounds with exactly 12
carbons, one or more nitrogens, and possibly some other elements. The
search is case-insensitive unless case is needed to
disambiguate. For example, either co or CO will retrieve
all compounds containing both carbon and oxygen, but Co will
instead retrieve all compounds containing cobalt.
Search by InChI string
InChI is short for International Chemical Identifier, and offers a way
to search for a molecule by its chemical structure. We support only
exact string matching for InChI strings.
Search by gene name or database identifier
Enter a gene
name, name fragment, or identifier (either the internal Pathway/Genome Database
identifier, or an identifier from some other database). The software will attempt to do auto-completion on the
string you have entered based on the contents of the database. If you
select one of the auto-complete options, then when you submit the form
you will be taken directly to the data page for the selected gene, regardless of any other search criteria you may have
specified (i.e., other search criteria are ignored). If you do not
select one of the auto-complete options, then the string you typed
will be the target of a substring search, which may be combined with
other search criteria.
Search by product name, database identifier or EC number
Enter a protein or RNA name, name fragment, identifier (either the
internal Pathway/Genome Database identifier or an identifier from some other database,
such as UniProt), or a fully specified EC number. The software will
attempt to do auto-completion, as for the gene name field.
Search/Filter by sequence length
Enter a minimum and/or maximum sequence length, and specify whether
the units referred to are nucleotides or amino acids. If either the
minimum or maximum field is left blank, then the sequence length is
unconstrained in that direction.
Search/Filter by replicon and/or gene map position
Enter a minimum and/or maximum gene map position, where the units are
the number of base pairs from the start of the replicon. The results
will include any gene that overlaps any portion of the specified
region. If either the minimum or maximum field is left blank, then the
map position is unconstrained in that direction. If the selected
organism has multiple replicons, then this search option will include a
checkable list of replicons – you may select one or more replicons
either instead of or in conjunction with the map position in order to
constrain the search to genes on a particular replicon.
Search/Filter by product molecular weight
Enter a minimum and/or maximum molecular weight for the gene product
in kilodaltons. If either the
minimum or maximum field is left blank, then the sequence length is
unconstrained in that direction.
Search/Filter by pI
Enter a minimum and/or maximum pI (isoelectric point) for the gene
product. (Typically little information
about pI is available for databases other
than EcoCyc or MetaCyc.)
Search/Filter by small molecule regulator, cofactor, substrate
or ligand
This search option is for retrieving all proteins
affected by a specified small molecule in any of several ways. An
example might be to search for all enzymes inhibited by ADP, or all
enzymes that use Mg2+ as a cofactor. Enter the name of a
small molecule. We recommend taking advantage of the auto-complete
facility to select the correct small molecule, as only an exact match
to a compound name can be accepted here. Check all roles that you are
interested in for this compound. Note that we consider cofactors to
include only compounds that are not modified in any way during the
reaction. Molecules such as NAD, which are modified, are considered
to be substrates, not cofactors. (Relatively little information
about activators, inhibitors, etc. is typically available for databases other
than EcoCyc or MetaCyc.)
Search/Filter by evidence code
The evidence ontology
appears here in browseable form. Each evidence code includes in
parentheses after its name the number of gene products that have their
function annotated with that code. Selecting one or more codes to
filter on allows you to restrict your search, for example, to all
proteins whose function has been established experimentally. The
Pathway Tools evidence codes and ontology are described
here.
Search/Filter by cell component
The cell component
ontology appears here in browseable form, along with the numbers of
gene products associated with each cell component. Selecting one or more
components allows you to restrict your search to proteins known to
be present in those cellular locations. (Note that relatively little
information about cellular locations of gene products is available for
databases other than EcoCyc or MetaCyc.) The
Pathway Tools cell component ontology is described
here.
Search/Filter by Gene Ontology
If the selected database
has been annotated using Gene
Ontology, then you will see a browseable ontology here. Only
terms that have one or more gene products annotated to them or their
children will be present, and the number in parentheses after each term
name indicates the number of gene products annotated to that term or
one of its children. You may browse this ontology to a particular
term to see all gene products annotated with that term. Clicking on a
gene product will then take you directly to the data page for that
gene product, just as clicking on a term name will take you to the
data page for that term. Alternatively, you can use the checkboxes to
indicate that your search should be restricted to include only gene
products annotated with the checked terms or their children. If you
wish to filter by only a single term, and you know the name or ID for
that term, you also have the option of typing it in the text box
(using auto-completion to ensure you select the correct term).
Search/Filter by MultiFun term
If the selected database
has been annotated using the MultiFun ontology,
then you will see a browseable ontology here. Only
terms that have one or more genes annotated to them or their
children will be present, and the number in parentheses after each term
name indicates the number of genes annotated to that term or
one of its children. You may browse this ontology to a particular
term to see all genes annotated with that term. Clicking on a
gene will then take you directly to the data page for that
gene, just as clicking on a term name will take you to the
data page for that term. Alternatively, you can use the checkboxes to
indicate that your search should be restricted to include only genes
annotated with the checked terms or their children.
Search/Filter by organism
This search option will be available only if the selected
database is a multi-organism database (such as MetaCyc), and allows you to browse directly for proteins from a
particular organism, or to restrict your search to one or more
taxonomic groups.
Search/Filter by publication
This search option is
useful for retrieving a list of all genes or gene products that
cite a given publication or author. Enter either the PubMed
ID, the author surname, or part or all of an article title.
Search for reaction by EC number or name
Enter a
reaction EC number or name (typically an enzyme name).
EC numbers can be either full or partial. The software will
attempt to do auto-completion on the name or EC number. If you
select one of the auto-complete options, then when you submit the form
you will be taken directly to the data page for the selected reaction
or reaction class, regardless of any other search criteria you may have
specified (i.e., other search criteria will be ignored). If you do not
select one of the auto-complete options, then the string you typed
will be the target of a substring search, which may be combined with
other search criteria.
Search/Filter by substrates or products
Enter a
compound name to retrieve all reactions in which that compound
participates either as a substrate or product. If you enter more than
one compound, then the reaction must involve all specified compounds
in order to be included in the results. We recommend taking advantage
of the auto-complete facility to select the correct compound, as
only an exact match to a compound name can be accepted here.
Search/Filter by ontology
This option allows you to
browse the Pathway Tools reaction ontology. Each reaction class includes in
parentheses after its name the number of reactions that are members of
that class. The ontology may be used in one of two ways. By
selectively clicking on + icons, you can browse to find a
reaction of interest, and click directly on its name
to visit the data page for that reaction. Alternatively, you can
check the checkbox next to one or more class names to limit your
search (which may also include other search criteria) so
as to only include reactions that belong to one of the checked
classes. Note that there are two parallel reaction classification
systems, one in which reactions are classified by conversion type
(this includes the entire EC hierarchy), and another in which the
reactions are classified by substrate. Most reactions in the database
have parents in both classification systems.
Search for pathway by name
Enter a pathway name, name
fragment, or internal Pathway/Genome Database identifier. The software will attempt to
do auto-completion on the string you have entered based on the
contents of the database. If you select one of the auto-complete
options, then when you submit the form you will be taken directly to
the data page for the selected compound. This is true regardless of
any other search criteria you may have specified (i.e. other search
criteria will be ignored). If you do not select one of the
auto-complete options, then the string you typed will be the target of
a substring search, which may be combined with other search criteria.
Search/Filter by ontology
This option allows you to
browse the Pathway Tools pathway ontology. Each pathway class includes in
parentheses after its name the number of reactions that are members of
that class. The ontology may be used in one of two ways. By
selectively clicking on + icons, you can browse to find a
pathway of interest, and click directly on its name
to visit the data page for that pathway. Alternatively, you can
check the checkbox next to one or more class names to limit your
search (which may also include other search criteria) so
as to only include pathways that belong to one of the checked
classes.
Search/Filter by number of reactions
Enter a minimum and/or maximum number of desired reactions in the pathway. If either the
minimum or maximum field is left blank, then the number of reactions is
unconstrained in that direction.
Search/Filter by substrates present
Enter one or more
compound names to retrieve all pathways in which those compounds
participate as a reactant, a product, or an intermediate. If you
enter more than one compound, then the pathway must involve all
specified compounds in order to be included in the results. We
recommend taking advantage of the auto-complete facility to select the
correct compound, as only an exact match to a compound name can be
accepted here.
Search/Filter by evidence code
The Pathway Tools evidence ontology
appears here in browseable form. Each evidence code includes in
parentheses after its name the number of pathways that have their
function annotated with that code. Selecting one or more codes to
filter on allows you to restrict your search, for example, to all
pathways whose presence has been established experimentally. The
Pathway Tools evidence codes and ontology are described
here.
Search/Filter by organism
This search option will be
available only if a multi-organiam database (such as MetaCyc) is the
selected database, and allows you to browse for pathways that are
curated as occurring in a particular organism based on experimental
information. The fact that a pathway is not stated to be present in a
given organism does not mean that the organism does not have the
pathway – pathways are curated for only a small subset of the
organisms in which they appear.
Search/Filter by expected taxonomic range
This search option will be
available only if a multi-organism database (such as MetaCyc) is the selected database. Each pathway in
MetaCyc has been annotated with its expected taxonomic range. This
search option allows you to restrict your search to include only those
pathways you could reasonably expect to see for a given taxonomic
grouping, for example, to restrict your search to pathways seen in
plants.
Search/Filter by publication
This search option is
useful for retrieving a list of all pathways that
cite (either directly or through one of the pathway’s enzymes, genes,
subpathways or substrates) a given publication or author. Enter either the PubMed
ID, the author surname, or part or all of an article title.
Some databases may include sets of growth media, along with information about whether or not the organism can grow on a particular medium and under what conditions (for example, gene knockout studies can indicate whether the organism can grow on a particular medium in the absence of a particular gene). To see the full list of growth media for a database, including an indication of which media have associated knockout data, click on the All Growth Media for this Organism button. Use the other fields of this form to search for growth media that meet certain criteria.
Search for growth media by name
Enter a growth
medium name or name fragment. The software will attempt to
do auto-completion on the string you have entered based on the
contents of the database. If you select one of the auto-complete
options, then when you submit the form you will be taken directly to
the data page for the selected compound. This is true regardless of
any other search criteria you may have specified (i.e. other search
criteria will be ignored). If you do not select one of the
auto-complete options, then the string you typed will be the target of
a substring search, which may be combined with other search criteria.
Search/Filter by compounds present in the medium
Enter up to four compound names to retrieve all growth media that
contain either any or all of the specified compounds. We recommend
taking advantage of the auto-complete facility to select the correct
compound, as only an exact match to a compound name can be accepted
here.
Search/Filter by compounds not present in the medium
Enter up to four compound names to retrieve all growth media that
do not contain any of the specified compounds. We recommend
taking advantage of the auto-complete facility to select the correct
compound, as only an exact match to a compound name can be accepted
here.
Search/Filter by observed growth
Select one or more growth levels to retrieve media on which any of the
selected levels of growth have been observed. If no gene knockout is
specified, then the growth levels refer to wildtype growth. If a gene
is specified, then the growth levels refer to knockouts of that gene.
When specifying a gene, we recommend using the auto-complete facility
to select the correct gene, as only an exact name match can be
accepted here.
The Advanced Search tool facilitates generation of queries that are more complex than those supported by the object search tools described above. Using the Advanced Search tool, you can write queries that combine data from multiple organisms or multiple types of objects, and you can search fields that are not supported by the individual object search pages. Detailed instructions for using the Advanced Search tool to construct complex queries are available here.
An ontology is a carefully constructed vocabulary of terms, often called a controlled vocabulary. The terms are organized into a classification hierarchy (also called a taxonomy). Ontologies can be used to browse and search for objects by drilling down from more general categories to more specific ones. Each Pathway/Genome Database contains several ontologies. Those that can be searched are available from the Ontologies sub-menu in the Search menu. These ontologies can also be accessed from the object search page for their particular object type. The browseable ontologies are:
Gene Ontology
Not all databases contain Gene
Ontology (GO) annotations, but for those that do, GO can be
browsed to see which gene products are assigned to which GO terms.
Each database only contains those terms to which one or more gene
products are actually assigned, so a term may be missing from the
browseable ontology even though it is a valid GO term. GO can also be
browsed from the Search Menu → Genes/Proteins/RNAs page.
MultiFun
Not all databases contain
MultiFun
annotations, but for those that do (currently only EcoCyc and MetaCyc), MultiFun can be
browsed to see which genes are assigned to which terms.
Each database only contains those terms to which one or more genes
are actually assigned, so a term may be missing from the
browseable ontology even though it is a valid MultiFun term. MultiFun
can also be browsed from the Search Menu → Genes/Proteins/RNAs page.
Pathways
The Pathway Tools pathway ontology classifies pathways into groups based on their
biological functions, and based on the classes of metabolites that
they produce and/or consume. It is also accessible from the Search Menu → Pathways page.
Enzyme Commission
<a
Enzyme Commission
numbers (EC numbers) form a classification scheme for enzymes,
based on the chemical reactions they catalyze. Pathway/Genome
Databases use EC numbers to organize enzyme-catalyzed reactions (rather
than the enzymes themselves) based on type of transformation and class
of substrates. The EC ontology can also be browsed from the Search Menu → Reactions page (as a child of Chemical-Reactions). Both
Search Menu → Reactions and Search Menu → Genes/Proteins/RNAs pages allow
searching by EC number.
Compounds
The Pathway Tools compound ontology describes small molecules, that is, chemical
compounds that are not macromolecules. It is also accessible from the
Search Menu → Compounds page.
The Search Menu → Google This Site command uses Google to perform a full text search over this entire Web site. Searches will not be restricted to the selected database, and can locate text strings found in page comments, help pages, and other page content not queryable by other means. Submitting this form will direct the user outside this Web site to a page generated by Google. A Google full text search is also offered as an option when a Quick Search fails to return any result (or does not return the desired result).
This facility (not available for MetaCyc) allows you to perform sequence-similarity searches using the BLAST program to compare your protein or nucleic acid sequence against the complete genome of the selected organism database.
Textpresso is a package for indexing and searching a corpus of biological literature. Textpresso searches are available for searching a large Escherichia coli literature corpus only at the BioCyc Web site, and are available only when EcoCyc is the selected database.
A Web Group is a collection of Pathway Tools objects, such as genes or pathways, together with associated data, that can be created, edited, manipulated, and shared on the web. You can upload tabular files to a web group (eg expression data), and download groups to files. You can perform transformations, enrichment analysis, filtering, and other operations on groups. A web group has a persistent URL, so you can use them as a data publishing and sharing platform. Groups can be private, public, or shared with a selected group of users.
Web groups are stored in the web accounts database, so to create groups you must have an account and be logged in. Users who aren’t logged in can view and download groups that others have made public.
We recommend Firefox for using Web Groups, and dis-recommend Internet Explorer. Other browers work fairly well, but there may be occasional problems.
Some terminology: A group consists of a set of rows and columns. A cell is the intersection of a row and a column, and can contain one or more values, which may be Pathway Tools objects (such as genes or pathways), numbers, or strings.
A group is displayed on its own web page (see the figure below). The URL of this page is persistent and may be bookmarked or shared. At the top of this page are some metadata about the group, such as its title and a textual description (these can both be edited by clicking on them). Information about the group’s contents and sharing status is also displayed.

In this example, we started with a group of genes (in the first column after the checkboxes), and added some properties
Typically the first column of a group will be a set of related Pathway Tools frames (eg, a set of genes from a search) and other columns will be properties or other values derived from the first column (eg, the products of the genes in the first column). The blue column headings are clickable and can be used to select individual columns for certain operations.
If a group has more elements than will fit on a page, paging controls will be displayed above the column headings. You can also choose to show all rows on one page.
The checkboxes on the left are used to select subsets of the group’s rows for deleting or copying to a new group. Note that checkboxes work properly over multiple pages – that is, you can check some rows, go to a new page and check some more, and the ones on the first page will still be considerered checked. Checking/unchecking the checkbox in the header will check or uncheck all rows in the group (not just the ones on the current page).
The group directory page provides a list of accessible groups. It may be accessed via the Tools → Groups menu item. The directory is composed of several tabs:
My groups – a list of groups you own
All groups – a list of all accessible groups
Public groups – a list of all public groups (note that these are also included in the All Groups tab)
Special groups – a list of computed groups, based on the currently selected organism
By default the group directory is ordered by update time (most recently changed first), but it can be resorted using the sort arrows in column headings.

There are a number of different ways to create a group (note that you must be registered and logged-in before these commands will be visible):
The results of web searches (eg from the Search → Compound page) can be converted to a group by means of the “Turn into a group” button.
You can create an empty group and fill it in by hand. To do this:
go to the groups directory page ([[http:/groups][*Tools → Groups*])
select the Groups → New → Empty Group menu item that is now visible. This creates a group with a single column and no rows.
Add a row by clicking the “Add row” link at the bottom of the display.
The row has an autocompleting text field. Enter a frame and hit Enter
Repeat steps 3 and 4 for the rest of your data.
You can create a group by importing a text file in tab-separated value format
go to the group directory page.
select the Groups → New → Group from uploaded file menu item that is now visible.
A panel will appear that allows you to specify and upload the file.
Values in uploaded files are initially just strings. To turn them into frames, select the appropriate column and use the Groups → Column → Set Type... menu item.
Enrichment analysis is a computational technique for identifying groups of objects (eg pathways) that are statistically overrrpresented in another group (eg, genes that are significantly up-regulated in an expression experiment). Please see the Pathway Tools Users Manual for more information on enrichment, including a description of the parameters available on the web.
You can invoke enrichment analysis on a group of objects in a web group by:
Selecting the column you want to operate on
Choosing an item from the Enrichments selector and clicking the button
Choosing parameters from the dialog

This operation always creates a new group, which contains 3 columns: the enriched objects, the p-value, and the matched objects from the original group. The new group will be sorted by p-value, lowest (most significant matches) first.
There are a number of ways to create new groups from existing groups.
You can simply copy a group via the Groups → New → Copy of this group menu command.
You can take a column of a group and turn its contents into a new group, using the + icon that appears in column headings, or the Groups → New → Group from column menu commands (these are equivalent operations).
See also the Filtering operation which has the option of creating a new group based on a filtered subset of row.
You can add columns to a group from the Groups → Add → Column menu item (which creates an empty editable column), or by using the transform and property selectors. These apply an operation to the contents of the selected column and create a new column. Eg, if the selected column contains genes, you can select the Pathways of gene item from the Transforms selector and click add, which will add a new column of pathways. Transforms represent computational operations on the object in the selected column; Properties represent slots of the object frames.
Editable columns (which are those that are not defined by a transform or other computation) can be edited by clicking the edit icon in the column header. This changes the cells to editable fields. Clicking the icon a second time will turn off editing for that column.
You can add a row by means of the link at the bottom of a group, or using the menu command Groups → Add → Row menu item (they are equivalent). Any editable cells in the new row are displayed in edit mode, so you can enter values.
Rows can be deleted by selecting them using the checkboxes on the left of the display, then choosing Groups → Delete checked rows menu item.
Groups can be resorted on the values of any column by means of the sorting controls in column headers.

Filtering means selecting a subset of rows according to some criterion. Only numerical columns can be used for filtering at present. To filter, select the appropriate column and choose the Groups → Column → Filter... menu item. A dialog appears that lets you select the filtering criterion. Note that you can choose to sort on the value or the absolute value of the column, and can specify a greater-than, less-than, or equal condition.
The New Group? checkbox controls what happens with the result of the filtering operation. If checked, a new group is created; if unchecked, the current group is replaced by the filtered rows. If the result of filtering is empty, an error is displayed instead in both cases.
The values in cells have a type, which may be either a Pathway Tools object (eg, a gene) or a string or number. Generally values in a single column will all be of the same type, but this is not required. You can control the type by means of the Groups → Column → Set Type... menu item. In general this is used after importing data from a file, to turn string values into Pathway Tools objects.
Columns can be rearranged with the Groups → Column > Move ... menu items. They can be deleted with the Groups → Columns > Delete menu items. These operations apply to the selected column. You can also delete a column by clicking on the appropriate icon in the column header
Once a group is defined, there are a couple of things you can do with it (other than browse it on the web).
Groups can be exported to files (in tab-separated value format) using the Groups → Export → To file... menu command.
Objects of the appropriate types (compounds, reactions, genes, ???) can be exported to the cellular overview using the Groups → Export → To cellular overvew menu command. Be sure to select the appropriate column first.
Please note that due to technical limits only a relatively small number of objects can be highlighted on the overview. If too many objects are selected, an error will appear. This limitation will be addressed in a future release.
By default, groups are readable and writeable only by their creator. You can grant access to other users by means of the Sharing dialog, available via the Groups → Sharing... menu item.

Access by the general public is controlled by the first two checkboxes. Public? means that anyone can view the contents of the group; Public and writable? means that anyone can view and edit the contents of the group (editing is restricted to logged-in users).
You can also control access on a per-user level using the “Share with users” boxes, which accept email addresses of registered Pathway Tools users.
As part of Web Groups, we have created an enhanced public user page, which can be accessed by clicking on any user name in the group directory (try the Public groups tab). A user page displays the user’s name, an optional user-settable graphic picture, and a list of the user’s public groups. There is also a user directory available.
The Cellular Overview diagram depicts the biochemical machinery of an organism as described in a PGDB. Each node in the diagram (such as the small circles and triangles) represents a single metabolite, and each blue line represents a single bioreaction. This page describes the organization of the Cellular Overview and the operations users can perform to interrogate it. Different PGDBs will have different components of the diagram present or absent depending on what was included by the PGDB authors.
Note: The Cellular Overview has been tested on Internet Explorer 8.0, Firefox 3.5, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the Cellular Overview since its performance can be very poor. The performance of the three other browsers are much better compared to Internet Explorer.
Note: The desktop version of Pathway Tools that you can install locally provides different and additional operations on the Web Overview. click here for more details.
Within the cytoplasmic membrane, the small-molecule metabolism of the organism is depicted in several regions. The glycolysis and the TCA cycle pathways, if present, will be placed in the middle of the diagram to separate predominately catabolic pathways on the right from pathways of anabolism and intermediary metabolism on the left. The existence of anaplerotic pathways prevents rigid classification. The majority of pathways operate in the downward direction. Signal transduction pathways, if present, run along the bottom of the diagram. Pathways are grouped into related clusters as indicated by the shaded regions.
The large group of individual reactions at the right of the diagram represent reactions of small-molecule metabolism that have not been assigned to any pathway.
The shapes of the metabolite icons represent various compound classes. The different shapes used are as follows:
Triangle: Amino Acids
Square: Carbohydrates and Derivatives
Diamond: Proteins and Modified Proteins
Vertical Ellipse: Purines
Horizontal Ellipse: Pyrimidines
T: tRNAs
Circle: All other compounds
Filled shape: Phosphorylated compound
The one or more cellular membranes of the organism are depicted, depending on the cellular architecture of the organism, and on whether that architecture was specified when the PGDB was created. Transporters will be depicted in the membrane in which they reside as blue lines whose arrowhead indicates the direction of transport. For gram-negative bacteria, periplasmic proteins will be depicted when identified in the PGDB.
The Cellular Overview is accessible from the command Tools → Cellular Overview. The current selected organism, as displayed on the right in the banner of the Web page, is used to generate the Cellular Overview diagram. The generation of the diagram can take some time if it was not previously generated by the Web server.
Once the Cellular Overview diagram is displayed, the most common operation is to move it left, right, up or down, since sometimes the entire overview cannot fit in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the widget located on the left top of the screen.
To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the Cellular Overview. By increasing the zoom level (i.e., going up in the ladder), names of compounds, enzymes, reactions, and pathways are eventually displayed.
Note that depending on the speed of the server, generating large Cellular Overviews (i.e., a zoom-in near the top of the ladder) might require some time.
Mousing over a Cellular Overview icon (e.g., a ‘tee’ icon for a tRNA) displays information about the object in a small tooltip popup. Click the ‘Keep Open’ button to keep that informational window open; drag the window by its title to re-position it.
Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys can be customized on your Mac via the system preferences panel.
All the commands for the Cellular Overview are available from the right-clicking menu or the menu Cellular Overview in the top menu bar.
The Cellular Overview can display your experimental data. See the Omics Viewer Section 5.7 below.
Left-Click on a object open a tooltip (i.e., small window) to display basic data about the object. The tooltip contains further Web links to display more data about the object or objects related to the clicked object.
Double-Left-Click in a blank area location does a zoom-in centered at that location.
Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire Cellular Overview left, right, up and down. You need to hold down the mouse button to do the panning.
Right-Click in a blank area opens a menu to invoke general commands applicable to the entire Cellular Overview. These commands are also available in the top menu bar under the menu ‘Cellular Overview’. All searching (highlighting) commands are under these menus. See the following list for an explanation of the general commands.
The commands in the Cellular Overview menu are:
Display Cellular Overview display the entire cellular overview for the currently selected organism (e.g., E. coli) as shown in the top right corner of the web page under the quick search box.
Overlay Experimental Data (Omics Viewer) opens up a window where a file can be specified to upload experimental data (such as gene expression or metabolomics data) to overlay, as colors, on the cellular diagram.
Highlight Pathway(s) provides two searching mechanisms for pathways in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the pathways. Highlighting is done on the reaction(s) of the pathway(s) found.
Highlight Reaction(s) provides four searching mechanisms for reactions in the cellular diagram: by name or frame ID, by substring, by EC number, or by enzyme name. The substring search is based on the name, synonyms, and frame ID of the reactions. Highlighting is done on the reaction(s) found.
Highlight Gene(s) provides three searching mechanisms for genes in the cellular diagram: by name or frame ID, by substring, or from a file. The substring search is based on the name, synonyms, and frame ID of the genes. The searching based on a file uses the gene names provided in a file located on your computer. Highlighting is done on the reactions and proteins corresponding to the gene(s) found.
Highlight Enzyme(s) provides two searching mechanisms for enzymes in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the enzymes. This substring search is identical to the reaction search based on enzymes. Highlighting is done on the reactions and proteins corresponding to the enzyme(s) found.
Highlight Compound(s) provides two searching mechanisms for compounds in the cellular diagram: by name or frame ID, or by a substring search. The substring search is based on the name, synonyms, and frame ID of the compounds. Highlighting is done on the compound(s) found.
Clear All Highlighting removes all the highlighting from the cellular diagram.
Show Legend opens a small window to show a legend of the icons used in the cellular diagram.
Help opens a new Web page to present a documentation on the Cellular Overview.
The following sections describe in more detail these operations and some others.
Selecting a new organism through the organism selector does not immediately change the Cellular Overview to this organism. At any moment you can display the complete Cellular Overview of the selected organism by selecting the command Display Cellular Overview in the menu obtained by right-clicking menu in a blank area, or from the menu bar Cellular Overview → Display Cellular Overview. If the selected database has no cellular diagram available, the next invoked command will display a warning to that effect. For example, MetaCyc, which is a multi-organism database, has no cellular diagram.
In this document, ‘Searching’ and ‘Highlighting’ are synonymous terms. There are several commands to search for reactions, pathways, enzymes, genes, and compounds. The search commands are available from the right-click menu and the the Cellular Overview menu from the top menu bar.
When a search is done, the objects found are highlighted in the Cellular Overview diagram which also creates a new overlay. The list of overlays is shown in the Layer Switcher panel on the right of the Overview Web page. This panel might be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. You cannot delete an individual overlay. But all highlighting, i.e., all overlays, can be removed by using the command Clear All Highlighting.
Since each overlay corresponds to a search operation, an overlay is identified with the keyword you entered to do the search. This is the name of the overlay. Next to each name a button labeled ‘List.’ Clicking ‘List’ opens a small dialog window listing the objects found for the corresponding search. Each object name is a hyperlink—clicking any of these links centers the Overview on the corresponding object and a red marker emphasizes its location.
The Pathway Tools Omics Viewer uses the Metabolic Overview for an organism to illustrate the results of high-throughput experiments in a global metabolic pathway context. The Omics Viewer can also be used for the Regulatory Overview, but only genes are involved in that case. Genes (in the case of a gene expression experiment) and proteins (in the case of a proteomics experiment) that are involved in metabolism are mapped to reaction steps in the Metabolic Overview, and the range of data values levels in a given experimental dataset is mapped to a spectrum of colors. Reaction steps in the Metabolic Overview are colored according to the corresponding data value. Similarly, for metabolomics experiments, compound nodes are colored according to the data value for the corresponding compound. This facility enables the user to see instantly which pathways are active or inactive under some set of experimental conditions.
The Omics Viewer can be used for:
Microarray Expression Data: Reaction lines (and protein icons, where present) are color-coded according to the relative or absolute expression level of the gene that codes for the enzyme that catalyzes that reaction step. The Omics Viewer allows a scientist to interpret the results of gene-expression experiments in a pathway context.
Proteomics Data: Reaction lines (and protein icons, where present) are color-coded according to the concentration of the enzyme that catalyzes that reaction step.
Metabolomics Data: Compound icons are color-coded according to the concentration of the compound.
Reaction Flux Data: Reaction lines are color-coded according to reaction flux values.
Other Experimental Data: Any experiment, high-throughput or otherwise, in which data values are assigned to genes, proteins, reactions or metabolites can be viewed in a pathway context using the Omics Viewer.
The Omics Viewer can show absolute data values (such as the concentration of a metabolite or protein, or the absolute expression level of a gene), or it can be used to compare two sets of experimental data by computing a ratio and mapping the ratios onto a color spectrum.
The superposition of multiple sets of experimental data on the metabolic overview can also be animated to show, for example, how gene expression levels of enzymes change with time over the course of an experiment.
The command Overlay Experimental Data (Omics Viewer), available from the right-click menu and the top menu bar Cellular Overview, overlays experimental data over the Cellular Overview diagram.
Once the Overlay Experimental Data command is invoked, a window will open, called the Omics Form, where you can specify a data file to upload and various parameters to control the interpretation of the data. The parameters are documented in the window but more details follow on the file format and the parameters to specify.
Experimental data is imported from a file provided by the user that is stored on the user’s computer. Each line of the file contains data for a single gene, protein, reaction or metabolite, and is of the form:
<name‑or‑ID> <data‑column1>...<data‑columnN>
Columns are separated by the tab character. Lines that
start with # or ; are taken to be comment
lines and are ignored by the program.
<name‑or‑ID> can be either a common name for an object (the
BioCyc data typically includes extensive synonym lists, and every
attempt is made to match a name to the appropriate target), or the
BioCyc internal ID for the object. Gene IDs from sequencing projects
(such as the E. coli B-numbers) are generally acceptable and
unambiguous. For protein or reaction data, EC numbers may be used.
You must specify whether the entities in the <name‑or‑ID> column
are genes, proteins, reactions, compounds, or a mixture.
The numbers in the data columns can represent either absolute or relative values. If the data values represent absolute numbers, you may choose to visualize either a single column of absolute data values (select ‘Absolute’ and one data column), or the ratio of two data columns as relative data values (select ‘Relative’ and two data columns). If the data values themselves represent relative numbers, then you need supply only a single column number, and select ‘Relative’. An entry (a row of data for a gene or other object) may contain any number of data columns (for example, if you wish to compile measurements from several experiments or time points into a single file), but only those data columns specified will be visualized at a time—all other columns will be ignored.
| Single gene expression experiment: | Sample datafile and brief description | See Cellular Overview for this data using ratio of columns 11 and 12. |
| Time series gene expression animation: | Sample datafile and brief description | See Cellular Overview for this data using columns 6 to 9. |
The color scale used depends on the type and, by default, the range of the data. Thus, a particular color may correspond to one gene expression level for one dataset, and a different gene expression level for another dataset, depending on the range of values or the supplied maximum cutoff value for each dataset. We use the spectrum from yellow/green to red, with yellow representing the lowest expression levels or ratios in the dataset, blue representing values in the middle, and red representing the highest values. Reactions for which no data was provided are drawn in black. The legend for mapping colors to data values is shown in the key, which is drawn to the right of the overview for a single experiment, or to the left for an animation.
A maximum cutoff value is chosen. By default, this is computed from the data. Alternatively, the user may supply a maximum cutoff value to use. Supplying the same maximum cutoff value for multiple experiments ensures that the same color scale is used for each one, so that the displays are directly comparable.
The minimum cutoff value is determined based on the maximum cutoff value and the other parameters. For absolute data values, we use a minimum cutoff value of zero. For relative data values that are not logs, we use the inverse of the maximum cutoff. For relative data values that are logs, we use the negative of the maximum cutoff. The color spectrum is then mapped evenly along a log scale between the maximum cutoff and the minimum cutoff.
In many cases, several genes or proteins, each with their own expression level or concentration, will map to a single reaction. This is because the reaction might be catalyzed by an enzyme complex made up of several gene products, or the reaction might be catalyzed by several isozymes, each with its own gene or genes. Since a reaction can only be colored a single color, we must choose which data value to use. For absolute data values, we choose the maximum. For relative data values, we choose the value whose log has the greatest deviation from zero, under the assumption that the user is primarily interested in identifying the entities whose behavior differ most between the two datasets.
Once the form to upload the data is submitted, by clicking the submit button at the bottom of the Omics Form, the data is processed by the Web server. The time to process the file depends on the speed of the server and the amount of data in the file. The results are returned to your browser in the form of highlighted objects (e.g., reactions). If several data experiments are loaded from the same file (i.e., several data columns are provided from the uploaded file), an animation is created where each step of the animation corresponds to one experiment (i.e., one column).
A small dialog window is opened to display the color scale for the experiment(s) and buttons to control the animation, if any. You can pause, restart, go forward or backward, increase or decrease the animation speed from this window.
Overlaying exprimental data can be done at any zoom level. Once the data is uploaded and overlayed, zooming out or in can be done, and the corresponding highlighting will be adjusted accordingly.
The tooltips for highlighted objects show the experimental data. The data displayed changes during an animation.
You can submit via a Web browser or other Web navigating software, to a Pathway Tools server, a URL containing a description of which objects to highlight in a Cellular Overview Diagram for a specific organism or a pre-defined expression data file residing on the Web server for the Omics Viewer. This URL also typically specifies a zoom level. Such a URL is also known as a HTTP Get operation. Using one of the provided operation, see table below for a list, it is also possible to submit expression data. But the amount of data to send is limited by the maximum URL length. To submit a large amount of data, say more than 50 expression values, use the HTTP GET or POST method as described in the next section. The resulting Cellular Overview will be returned as a HTML page.
Essentially, such a URL can be used instead of manually selecting an organism, visiting the Cellular Overview Web page, and performing highlighting interactively.
The general form of such a URL is:
<host>/overviewsWeb/celOv.shtml?zoomlevel=<integer>&orgid=<orgid>&<op>=<string>
The notation <...> should not be used literally but represents a
value or parameter to specify. For example, <integer> represents an
integer. (See below for a URL example.)
The host depends on the server you want to access. For example,for BioCyc, the <host> is BioCyc.org.
All the parameters (after the question mark ’?’) are optional, but it
is recommended to have at least the organism id (orgid) specified as
otherwise a default organism will be used which depends on the
Web server used. The zoomlevel parameter specifies an
integer value. The first zoom level is 0 (smallest overview) and
currently the highest is 6 (largest Overview).
You can specify zero, one, or more ‘op’ operations. The following table gives a summary of the possible operations and the corresponding highlight operations. Notice that all these operations correspond to the operations available from the top menu bar when a Cellular Overview diagram is displayed.
The possible operations ‘op’ for highlighting are in the following
table. The operation xnids is special as it accepts expression
data as well. An expression value can be specified after each name.
|
The string specified after the ‘=’ for an operation must not be quoted and any special character must be URL encoded. The string is not case-sensitive.
The following URL would open the Cellular Overview for organism Escherichia coli K-12 substr. MG1655 at zoom level 0, and create three highlighting overlays: 1) reactions with names having substring ‘hydro’, 2) reactions with names having substring ‘oxy’, 3) reactions, compounds, and proteins related to genes having a name with substring ‘arg’.
biocyc.org/overviewsWeb/celOv.shtml?zoomlevel=0&orgid=ECOLI&rsubs=hydro&rsubs=oxy&gsubs=arg
Such URLs with highlighting operations can be automatically generated using the command Cellular Overview → Generate Bookmark for Current Cellular Overview.
The Omics Viewer can be started by using a URL link with expression data that resides on a Web server. In that case, no data is sent from the browser but parameters are specified in the URL. In other words, a GET request can be done to start the Cellular Omics Viewer with data on a Web server and the parameters specified on the GET request itself.
For example the following URL link, when clicked, would start the Cellular Overview Omics Viewer with the expression data from file time-series.txt on the Webserver BioCyc.org (from a subdirectory expr-examples) at zoom level 0, for organism ecoli, and using columns 2 to 4 from the file. The file contains gene names (or frame ids) in the first column.
biocyc.org/overviewsWeb/celOv.shtml?omics=t
&url=http://biocyc.org/expr-examples/time-series.txt
&zoomlevel=0&orgid=ECOLI&column1=2-4&class=gene
All parameters that can be specified, except the parameter datafile, are listed in Section Submitting ExpressionData Via a HTTP POST. The additional parameters that can be specified for a URL (aka GET request) are listed in the following table.
| omics | Its value must be the letter t. This parameter says to start the Omics Viewer. |
| orgid | An organism identifier. This is the unique identifier of the PGDB corresponding to the desired Cellular Overview. The data file (specified below) must correspond to this identifier. |
| zoomlevel | An integer from 0 to 6 to specify the zoom level of the Cellular Overview. |
| url | The path to the file containing the expression data to upload in the Omics Viewer. This path uses protocol ‘file://’ or ‘http://’. The file protocol refers to a file on the Web server accessed, not your local computer. The http protocol can upload a file from any Web server, not just the Web server accessed to start the Omics Viewer. |
You can submit data to the Cellular Omics Viewer without using the GUI interface (i.e., the dialog window) but using an HTTP POST method. This would typically be done by users that prefers to design their own GUI interface to submit the expression data or to perform expression data analysis from another software or Web site.
An HTTP POST method is a standard mechanism to submit data to a Web server. It is composed of a URL, parameters and their value, and data.
In most cases, for a Web page, you would use a POST method when using a HTML form. Technical details about HTML forms can be found at Forms at W3C. We currently support only the application/x-www-form-urlencoded content type as the expression data is not sent as binary, but as text.
The general URL syntax to send the POST method request is
http://biocyc.org/<orgid>/overview-expression-map
where <orgid> is the organism identifier for the database
to access.
The data section of the post data, named ‘datafile’, can contain multiple lines of data. Each line is a row of a table, and the table can have multiple columns. The column of data are separated by the tab character. The first column, having index 0, contains the name or frame ids of the objects to consider for the Omics Viewer. They might be genes, compounds, reactions, proteins, pathways, or a mix of these.
If you are using your own HTML form, you do not need to consider the encoding details of the data sent as the browser automatically do the encoding. But you do need to know the name of the parameters to use in your form, their possible values, and their meaning. The following table summarize that information.
The Cellular Overview accepts the following parameters:
Name of Parameters | Possible Values | Meaning |
|
datafile | multi-line of data as a table | Contains all the data to use for the Omics Viewer. Some columns of data might not be referred by the following parameters which is not considered an error. |
| expressiontype | absolute relative | In absolute mode, the range of values used for the colors is based on the data itself. For relative, the range of values for the colors is assumed symmetric around 0. That is the maximum absolute value in your data is used as the positive maximum value, and the range is extended, if required, on the negative side to make the range symmetric. |
| numcolumns | 1 2 | 1 means expression value is based on one data column, 2 means two columns are used for expression data, one numerator and one denominator |
| column1 | integer(s) or range of integers | The columns containing the data (for numerators, if numcolumns is 2). |
| column2 | integer(s) or range of integers | The columns containing the denominators, if numcolumns is 2. |
| color | default specify 3-color | The color scheme to use. The default uses the full color spectrum for the values. The ‘specify’ scheme uses the full color spectrum, but a maximum cut-off value can be specified. The 3-color scheme uses only three colors, and you specify a maximum threshold to use to mark the values red when they are above it, yellow if they are below its additive inverse or blue if the values are in-between. |
| maxcutoff | a number | The full color spectrum is used. The maximum value to use when color scheme ‘specify’ is specified. |
| threshold | a number | The maximum threshold value to use when 3-color is specified. |
| log | on off | If you want your data to be interpreted as a log scale, specify ‘on’, otherwise specify ‘off’. In ‘off’ mode, all negative data are discarded. |
| class | gene protein compound reaction nil | The type of names or frame ids in column 0. The value ‘nil’ means any type. |
You might also submit a POST request using any programming language, open a network connection to our Web server on port 80, and submit a correctly formed HTTP request. The data that follows the header needs to conform to the application/x-www-form-urlencoded content type encoding, and the parameters name needs to be the ones mentioned in the table above.
An example of parameters with data follows. There are nine rows of data, the expressiontype is relative, the values are not made of ratio (numcolumns=1), the log scale is used, the class of objects specified in the data section are genes, columns 1 and 2 are used to create an animation, no column2 value is given since this is not using a ratio, the color scheme is the default one, and there are no threshold nor a maximum cutoff value specified.
-----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="datafile"; filename="example.txt" Content-Type: text/plain b0468 0.104350612 0.033206048 b0469 -0.186798562 -0.095079653 b0470 0.013754626 -0.047423932 b0471 -0.040057242 -0.047031817 b0472 0.199058776 0.226302741 b0473 -0.067916761 -0.430108036 b0474 0.084574625 -0.11661899 b0475 -0.067991385 0.09949957 b0476 0 0 0.930017971 -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="expressiontype" relative -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="numcolumns" 1 -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="log" on -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="class" gene -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="column1" 1-2 -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="column2" -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="color" default -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="maxcutoff" -----------------------------2120074538104283772813445706 Content-Disposition: form-data; name="threshold" -----------------------------2120074538104283772813445706--
Instead of displaying the entire cellular overview, you may wish to see omics data superimposed on one or only a few pathways of interest. In order to do this, you must know the Pathway Tools object identifiers of the pathways you wish to display.
To submit data via the HTTP POST method for display on specified pathways, use the same syntax and parameters described in Section Submitting Expression Data Via a HTTP POST, with an additional parameter named “pathways”. The value of this parameter should be one or more pathway identifiers, separated by whitespace or punctuation. The result will be a table containing one row for each pathway. Each row contains the pathway name, the pathway diagram with data superimposed, and a list of enzymes and genes in the pathway. If the data represents multiple timepoints, then the table will contain a diagram column for each timepoint.
To submit data via the HTTP GET method for display on specified pathways, use the same syntax and parameters as for POST, described above, but instead of submitting data using the datafile parameter, you must specify a URL from which the data can be retrieved, using the url parameter described in Submitting Expression Data Via a HTTP GET. Following is an example URL to retrieve a table of three particular pathways with omics data superimposed:
http://biocyc.org/ECOLI/overview-expression-map?url=http://myserver/mydata.txt
&expressiontype=relative&numcolumns=1&column1=1&log=on&class=gene
&color=default&pathways=GLYCOLYSIS,TCA,TRPSYN-PWY
You can use the same URL, but omitting all the omics parameters, to retrieve a table of selected pathways without any omics data, for example:
http://biocyc.org/ECOLI/overview-expression-map?pathways=GLYCOLYSIS,TCA,TRPSYN-PWY
You can also add most of the same omics parameters to the URL for a regular pathway page to see omics data added to the regular pathway diagram. For example, the URL to display the pathway page for the TCA pathway in EcoCyc at a detail level that shows all the enzyme and gene names is:
http://biocyc.org/ECOLI/new-image?type=PATHWAY&object=TCA&detail-level=2
To add omics data to this page, supply the omics parameters, as in the following example:
http://biocyc.org/ECOLI/new-image?type=PATHWAY&object=TCA&detail-level=2
&url=http://myserver/mydata.txt&expressiontype=relative&numcolumns=1
&column1=1&log=on&class=gene&color=default
There is no way to add omics data to standard pathway pages using the HTTP POST method.
Note: The regulatory overview has been tested on Internet Explorer 7.0, Firefox 3.3, Safari 4.0 and Chrome 2.0. It is recommended not to use Internet Explorer for the regulatory overview since its performance can be very slow when manipulating a large number (more than 100) of highlighted genes. The performance of the three other browsers are much better compared to Internet Explorer.
The regulatory overview enables you to visually analyze the regulatory relationships between genes for a specific organism. These relationships are based on the regulatory data available in the database (i.e., PGDB) of the organism. Currently, the relationships are based on transcriptional regulatory data (future versions may cover other types of regulation).
The regulatory overview is represented as a network with nodes and arrows (i.e., arcs). Each node represents a gene of a specific organism. There is an arrow from gene A to gene B if and only if A regulates B.
When first displayed, the overview does not show any regulatory arrow relationships since, typically, their great number would clutter the overview. These arrows can be selectively added by using the highlighting commands. See the sections below for more information on highlighting commands.
Not all organisms have regulatory data in their PGDB. If the command Tools → Regulatory Overview is grayed out, no regulatory overview can be displayed for the selected organism. Otherwise, by selecting the command Tools → Regulatory Overview a regulatory overview Web page will open and the complete regulatory overview of the selected organism will be displayed. The menu Regulatory Overview will be added to the top menu bar. It has several commands specifically for the regulatory overview.
It is possible to display a regulatory subnetwork of a specific organism by doing a series of highlighting and then use the command Redisplay Highlighted Genes Only. This command will create a new, smaller layout of the regulatory network that contains the genes that are highlighted only. Genes that do not regulate, or are not regulated by any highlighted genes, are not included in the subnetwork. Further operations can be done on this subnetwork as for the complete overview. See the Section Redisplay Highlighted Genes Only below for more details.
The most common operation is to move the regulatory overview left, right, up or down, since sometimes the entire network cannot fit entirely in the Web page. This can be done by holding down your left mouse button in a blank area then moving the mouse in the desired direction. This is called a panning operation. Panning can also be done by a small increment by clicking the arrows on the graphic at the top left of the screen called the panning widget.
To zoom-in or zoom-out, you can use the icon in the form of a ladder on the left of the overview Web page. Each step of the ladder is a zoom level. You can select any one of them at any time. You can also click a plus or minus sign (displayed on the top and bottom of this ladder) to zoom-in (increase size) or zoom-out (decrease size) the regulatory network. By increasing the zoom level (i.e., going up in the ladder), the gene names might overlap the network nodes— increasing the zoom level should remove such overlaps. The last zoom level (i.e., the last step of the ladder) will always force the display of all gene names in the network.
Note that depending on the speed of the server, generating large regulatory network overviews (i.e., a zoom-in near the top of the ladder) may require some time. They might have been already generated or they might need to be generated by the server. Accordingly, the response time might vary.
Mousing over a gene node displays a tooltip with data about the genes, its product, the possible ligand, the direct regulatees and regulators. Left-clicking the gene node will open a new Web page containing even more data specific for the gene.
Other more complex visual commands can be reached by right-clicking on genes or in a blank area. This is discussed in detail in the following sections.
Note for Mac users with a one-button mouse: left-click is the usual click, and right-click is the Mac control-click (i.e., you hold down the control key and click). But the exact keys to use may be customized on your Mac via the preferences panel.
Left-Click on a gene node opens a new browser window with information about the gene.
Left-Click (and holding) in a blank area allows to pan (i.e., move) the entire regulatory network left, right, up and down. You need to hold down the mouse button to do the panning.
Right-Click on a gene node opens a menu to select a command to apply for this gene. The commands highlight the direct and/or indirect regulatees and/or regulators for this gene and show highlighted arcs between regulatees and regulators.
Right-Click in a blank area opens a menu to select general command applicable to the entire regulatory network. These commands are also available in the top menu bar under the menu ‘Regulatory Overview’.
Double-Left-Click in a blank area does a zoom-in operation.
The following sections describe in more details these operations and some others.
Selecting a new organism through the organism selector does not immediately change the regulatory overview to this organism. The next operation such as zoom-in or zoom-out will apply to the new selected organism. At any moment you can display the complete regulatory overview of the selected organism by selecting the command Display Complete Regulatory Overview under the right-clicking menu in a blank area or from the top menu bar Regulatory Overview → Display Complete Regulatory Overview. If the selected database has no regulatory data, the next regulatory command will display a warning to that effect.
For any organism, there are two layouts available: nested ellipses or top to bottom.
The layout nested ellipses uses up to three ellipses to display the gene nodes. The inner most ellipse contains, in alphabetical order of the gene names, the genes that have the largest number of regulatees. The middle ellipse contains genes that regulate at least one gene. The outer ellipse contains the genes that have no regulatees. They might be displayed as groups of genes regulated by the same set of genes (a multi-regulon). This is typically done using triangles or a short straight line if the group is small.
The layout top to bottom uses several straight rows to display the gene nodes. Each row contains genes that do not directly regulate each other. The top row contains the genes that regulate the largest number of genes. The bottom row contains genes that do not regulate any genes. In between rows contain genes that regulate some other genes. As for the nested ellipses layout, this row might have genes grouped in straight lines or triangles.
There are several commands to highlight genes and show the regulatory relationship arrows between them.
Two commands use the gene name, or a substring of gene names, or a gene frame-id. Both of these commands are available by right-clicking in a blank area, or from the top menu bar under Regulatory Overview. The command Highlight Gene By Name or Frame ID highlights at most one gene. It is essentially a search command since you might not know the location of that gene in the regulatory network. Once found, the regulatory network will be centered on the location of the gene. The command Highlight Genes By Substring may highlight several genes. Selecting the command opens a panel from which you can enter a string of characters. Once clicking the button labeled Highlight in the panel, the genes highlighted have a name that contains the given string (this is a case-insensitive search). For this command it is also possible to include the regulatory relationships between the genes found.
The command HighlightGenesByGeneOntologyTerms accessible from the right-clicking menu enables you to select one or more Gene Ontology (GO) terms. The genes that produce proteins annotated with the selected GO terms will be highlighted. The option Include Relationships Arrows enables you to add relationship arrows between the highlighted genes. Note that if you are displaying a subnetwork, there might be genes with such products in the organism but that these might not be in the subnetwork. In such a case, a warning is given that no genes have been highlighted.
Right-clicking on a gene will open a menu of highlighting commands specific to that gene. The menu may contain from one to seven commands. Since some genes do not have any regulators or/and any regulatees, this list of commands may vary from gene to gene. Here are the list of all possible commands available from this menu where name will be the gene name (e.g., trpA) on which the right-clicking was done. The highlighting is done with one a specific color but that color changes from one executed highlighting command to the next.
Highlight Gene name Highlights only the gene selected.
Highlight Gene name and its Direct Regulatees The gene selected and all its direct regulatees are highlighted and relationship arrows are displayed from the selected gene to its regulatees.
Highlight Gene name and its Direct Regulators The gene selected and all its direct regulators are highlighted and relationship arrows are displayed from the regulator genes to the selected gene.
Highlight Gene name and its Direct Regulatees and Regulators This command combines the two previous commands.
Highlight Gene name and its Direct and Indirect Regulatees The selected gene and all its direct regulatees and indirect regulatees are highlighted and relationship arrows are displayed from regulators to regulatees.
Highlight Gene name and its Direct and Indirect Regulators The selected gene and all its direct regulators and indirect regulators are highlighted and relationship arrows are displayed from regulators to regulatees.
Highlight Gene name and its Direct and Indirect Regulatees and Regulators This command combines the two previous commands.
When a highlighting operation is done, a new overlay is created. The list of overlays is shown in the Layer Switcher panel on the right of the overview Web page. This panel may be minimized, in which case a small icon with a plus-sign is shown. Click on the plus-sign icon to open the panel. From this panel you can activate or deactivate specific overlays. This is particularly useful if you use the command Redisplay Highlighted Genes Only.
All highlighting can be removed by using the command Clear All Highlighting.
For more information about highlighting, see Section Redisplay Highlighted Genes Only.
The command Redisplay Highlighted Genes Only will display a regulatory network by considering only the genes that are highlighted. The layout is changed to “top to bottom” since it is usually a better layout when using a small set of genes. This command would be used after a series of highlighting operations to select a set of genes to analyze closely. The current displayed regulatory network will be removed and a new regulatory network will be displayed. The active highlighting will remain active. All overlays (active or not) will also remain. It is useful to keep the deactivated overlays since you may come back to the complete regulatory network and reactivate them to recreate a new regulatory subnetwork. Note that genes that do not regulate or are not regulated by any highlighted genes are not included in the subnetwork.
To redisplay the complete regulatory network, use the command Display Complete Regulatory Overview accessible when right-clicking in a blank area. The current active overlays remain active and the deactivated overlays are not removed.
The information in tooltips within a subnetwork display (produced when mousing over gene nodes) are restricted to that subnetwork. That is, the tooltip’s list of regulatees and regulators are for the subnetwork, not for the entire regulatory network of the organism. However, when you transition from a subnetwork display back to the display of the entire network, any highlighting done on a subnetwork will be expanded for the entire regulatory network to show relationships within the full network. For example, if gene A has four direct regulatees in a subnetwork, but twenty regulatees in the entire network, when the operation Highlight Gene A and its Direct Regulatees is applied in the subnetwork, only the four regulatees are highlighted, but once you redisplay the entire network, the twenty regulatees will be highlighted.
The Pathway Tools Omics Viewer for the Regulatory Overview illustrates the results of high-throughput experiments in the context of gene regulation. Genes that are involved in regulation are mapped to gene icon in the Regulatory Overview diagram, and the range of data values levels in a given experimental dataset is mapped to a spectrum of colors. This facility enables the user to see instantly which genes are active or inactive under some set of experimental conditions.
The Omics Viewer for the Regulatory Overview is very similar to the Omics Viewer for the Cellular Overview. The main difference is that the data file must contain in its first column gene names or frame ids. To start the Omics Viewer for the regulatory overview, use the command Overlay Experimental Data (Omics Viewer) under the Regulatory Overview menu.
See the Omics Viewer Section 5.7 for more information on how to use the Omics Viewer.
Comparative Analysis allows users to generate summaries of individual PGDBs, or compare statistics between PGDBs. Currently we support comparative analysis of reactions, pathways, compounds, proteins, orthologs, transporters, and transcription units. Prior to running the comparative analysis, you will be prompted to select one or more PGDBs for which to perform the analysis. To access the Comparative Analysis tool, go to: Tools Menu → Comparative Analysis.
Pathway Tools has an optional feature that allows Pathway / Genome Databases (PGDB) that have sequence data to be searched using NCBI BLAST.
To access the Web interface for BLAST searches, go to: Search Menu → BLAST.
Documentation on the use of the Web interface for NCBI BLAST can be found here.
Pathway Tools Web accounts give you the ability to have frequent users enrich their experience when accessing PGDBs via the Web.
Web site accounts provide several benefits. Through your account you can:
Customize the appearance of pages on this Web site
Store organism sets for comparative operations
Configure default settings for the Omics Viewers
Receive important email updates about this Web site
The Web accounts system is optional for a Pathway Tools Web server. If enabled, you should see a login prompt at the upper right corner of any Web page from a Pathway Tools Web server. Please see the Pathway Tools User Guide for more information on how to set up web accounts for your website.
Pathway Tools data is available in XML format via several different REST-based web services (as well as in a variety of XML and non-XML based downloadable formats). All of the URLs in this section assume that you are attempting to access a Pathway Tools web server located at http://host.domain.org. Obviously you should instead use the actual web address of the Pathway Tools site you are attempting to access (such as http://websvc.biocyc.org).
Pathway data for an individual pathway is available in BioPAX format (both BioPAX Level 2 and Level 3). The URL to access a pathway in BioPAX format is:
http://host.domain.org/[ORGID]/pathway-biopax?type=[2|3]&object=[PATHWAY]
where
[ORGID] is the identifier for the organism database, e.g. ECOLI, META, AFER243159
[2|3] specifies whether data should use BioPAX Level 2 or Level 3. If the type argument is omitted, BioPAX Level 3 will be generated.
[PATHWAY] is the internal BioCyc identifier for the pathway, e.g. GLYCOLYSIS, ARGSYN-PWY, PWY0-1299
Example URLs:
http://websvc.biocyc.org/BSUB/pathway-biopax?type=3object=GLYCOLYSIS
Retrieve the glycolysis pathway in Bacillus subtilis in BioPAX Level 3 format.
http://websvc.biocyc.org/AFER243159/pathway-biopax?type=2object=CYSTSYN-PWY
Retrieve the cysteine biosynthesis pathway in
Acidithiobacillus ferrooxidans in BioPAX Level 2 format.
http://websvc.biocyc.org/META/pathway-biopax?object=PWY-5025
Retrieve the IAA biosynthesis pathway in MetaCyc in BioPAX Level 3 format.
Any pathway, reaction, compound, gene, protein, RNA or transcription-unit object in a BioCyc database can be retrieved in ptools-xml format, an XML format that is based on and closely resembles the underlying Pathway Tools schema. A single object can be requested using its internal BioCyc identifier, or a query can be issued using either a subset of the Pathway Tools API functions or the BioVelo query language to retrieve multiple objects.
For information on interpreting ptools-xml format, see the Guide to ptools-xml and the Pathway Tools Schema Guide.
The URL to access an object in ptools-xml format is
http://host.domain.org/getxml?[ORGID]:[OBJECT-ID]
or
http://host.domain.org/getxml?id=[ORGID]:[OBJECT-ID]&detail=[none|low|full]
where
[ORGID] is the identifier for the organism database, e.g. ECOLI, META, AFER243159.
[OBJECT-ID] is in the internal BioCyc identifier for the object, e.g. ARGSYN-PWY, EG11025, FRUCTOSE-6P. Note that object identifiers are case-sensitive.
[none|low|full] indicates whether the returned output should contain no detail, low detail or full detail for the requested object. If no detail parameter is supplied, the request defaults to full detail.
Example URLs:
http://websvc.biocyc.org/getxml?BSUB:GLYCOLYSIS
Retrieve the glycolysis pathway in Bacillus subtilis in
ptools-xml format.
http://websvc.biocyc.org/getxml?id=META:ASPARTATEKIN-RXN
Retrieve the aspartate kinase reaction from MetaCyc in ptools-xml
format.
http://websvc.biocyc.org/getxml?id=ECOLI:TRYPSYN-APROTEIN&detail=low
Retrieve the trpA gene product from EcoCyc at low detail level.
The set of Pathway Tools API functions were defined to allow users who have downloaded and installed the Pathway Tools software locally to write programs that operate on the data. A subset of these API functions have been made available via the web services interface.
The URL to issue an API query that returns a list of objects in ptools-xml format is
http://host.domain.org/apixml?fn=[API-FUNCTION]&id=[ORGID]:[OBJECT-ID]&detail=[none|low|full]
where [ORGID], [OBJECT-ID] and detail level are as for the single object queries described above, and [API-FUNCTION] is one of the following:
all-products-of-gene
binding-site-transcription-factors
chromosome-of-gene
compounds-of-pathway
containers-of
containing-tus
direct-activators
direct-inhibitors
enzymes-of-gene
enzymes-of-pathway
enzymes-of-reaction
genes-of-pathway
genes-of-protein
genes-of-reaction
genes-regulated-by-gene
genes-regulating-gene
modified-containers
modified-forms
monomers-of-protein
pathways-of-compound
pathways-of-gene
reactions-of-compound
reactions-of-enzyme
reactions-of-gene
regulator-proteins-of-transcription-unit
regulon-of-protein
substrates-of-reaction
top-containers
transcription-unit-activators
transcription-unit-binding-sites
transcription-unit-genes
transcription-unit-inhibitors
transcription-unit-mrna-binding-sites
transcription-unit-promoter
transcription-unit-terminators
transcription-unit-transcription-factors
transcription-units-of-gene
transcription-units-of-protein
A more detailed description of each API function is available here.
Example URLs:
http://websvc.biocyc.org/apixml?fn=genes-of-pathway&id=BSUB:GLYCOLYSIS
Retrieve the set of genes that participate in the glycolysis pathway
in Bacillus subtilis.
http://websvc.biocyc.org/apixml?fn=genes-regulated-by-gene&id=ECOLI:EG10164&detail=none
Retrieve the set of genes (IDs only) regulated by the crp gene in EcoCyc.
http://websvc.biocyc.org/apixml?fn=enzymes-of-reaction&id=META:TRYPSYN-RXN&detail=full
Get detailed information on all enzymes in MetaCyc that catalyze the
tryptophan synthase reaction.
The URL to issue a BioVelo query that returns a list of objects in ptools-xml format is
http://host.domain.org/xmlquery?[QUERY]
or
http://host.domain.org/xmlquery?query=[QUERY]&detail=[none|low|full]
where
[QUERY] is a properly escaped BioVelo query string that returns a single list of Pathway Tools objects (it is possible to create BioVelo queries that return multi-column tables – these are not appropriate as input for this web service). For more information about constructing BioVelo queries, see the Guide to the BioVelo Query Language.
[none|low|full] indicates whether the returned output should contain no detail (i.e. only object identifiers and links will be included), low detail (names and a handful of other attributes will be included) or full detail for the matching objects. If no detail parameter is supplied, the request defaults to low detail.
Example URLs:
http://websvc.biocyc.org/xmlquery?[x:x<-ecoli^^pathways]
Retrieve the complete set of pathways in the EcoCyc database at low detail.
http://websvc.biocyc.org/xmlquery?query=[x:x<-ecoli^^genes,x^name="trpA"]&detail=full
Retrieve the gene (or genes) in the EcoCyc database with the name
"trpA" in full detail.
http://websvc.biocyc.org/xmlquery?dbs
Retrieve the set of available organism databases
http://websvc.biocyc.org/xmlquery?[x:y:=bsub argsyn-pwy,x<-(enzymes-of-pathway
y)]
Retrieve the set of enzymes that participate in the arginine biosynthesis pathway in Bacillus subtilis
http://websvc.biocyc.org/xmlquery?[x:x<-meta^^proteins,"aspartate"
instringci x^names"kinase" instringci x^names]
Retrieve the set of proteins in MetaCyc that have the words "aspartate" and "kinase" in their common-name or synonyms.
http://websvc.biocyc.org/xmlquery?query=[x:x<-ecoli^^pathways,ecoli EG10258
in (pathway-to-genes x)]&detail=none
Retrieve the identifiers of all pathways containing the eno (enolase)
gene in Escherichia coli.
Pathway/Genome Database Concepts Guide
How to download Pathway Tools and organism flat-file databases