1. NCBI HomoloGene
link: http://www.ncbi.nlm.nih.gov/HomoloGene/HTML/homologene_buildproc.html
Proteins from input organisms-> bastp-> find DNA sequence of proteins -> synteny -> maximaization global score
2. Online blog 1
link: http://www.personal.psu.edu/zuz17/blogs/psu_life/2011/02/understand-ucsc-netchain-alignment-1.html
3. paper1
link: http://genome.cshlp.org/content/11/5/803.full
“Computational Inference of Homologous Gene Structures in the Human Genome”
4. paper2
link: http://www.sciencemag.org/content/320/5875/486.full
”
Eukaryotic genomes differ in the degree to which genes remain on corresponding chromosomes (synteny) and in corresponding orders (collinearity) over time (1). For example, most eutherian (placental mammal) orders have incurred only moderate reshuffling of chromosomal segments since descent from common ancestors ∼130 million years ago (2). Indeed, karyotype evolution along major vertebrate lineages appears to have been slow since an inferred whole-genome duplication occurred ∼500 million years ago (3). Accordingly, accurate identification of orthologs across eutherian taxa is relatively routine, and deduction of synteny and collinearity is often straightforward with best-in-genome criteria (4), identifying one-to-one best matching chromosomal regions in pairwise genome comparisons.
”
5. Through evolutionary analysis
link: http://genome.cshlp.org/content/8/3/163.full
6. Ensembl Gene Homolog prediction method
link: http://www.ensembl.org/info/docs/compara/homology_method.html
”
http://www.ensembl.org/info/website/news.html
ProteinTrees and homologies (all species)
GeneTrees (protein-coding) with new/updated genebuilds and assemblies
- Clustering using hcluster_sg
- Multiple sequence alignments using MCoffee or Mafft
- Phylogenetic reconstruction using TreeBeST
- Homology inference
- Pairwise gene-based dN/dS scores for high coverage species pairs only (both on orthologues and paralogues)
- GeneTree stable ID mapping
- Per family gene dynamics using CAFE
ncRNAtrees and homologies (all species)
- Classification based on Rfam models
- Multiple sequence alignments with Infernal
- Phylogenetic reconstruction using RAxML
- Phylogenetic reconstruction using FastTree2 and RAxML-Light for very big families
- Additional multiple sequence alignments with Prank (w/ genomic flanks)
- Additional phylogenetic reconstruction using PhyML and NJ
- Phylogenetic tree merging using TreeBeST
- Per family gene dynamics using CAFE
- Homology inference
”
7. UCSC
(chimpanzee as example)
“The RNAs were aligned against the chimp genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. ”
Softwares
1. GenScan
2. Treebest
http://treesoft.sourceforge.net/treebest.shtml
Used in Ensembl homolog gene prediction
3. MCScanX
http://chibba.pgml.uga.edu/mcscan2/
4. Mercator
http://www.biostat.wisc.edu/~cdewey/mercator/
Database
1. phylomedb
2.Oma Browser
http://omabrowser.org/cgi-bin/gateway.pl
3. EggNOG
http://eggnog.embl.de/version_3.0/
Review
The quest for orthologs: finding the corresponding gene across genomes
Key words: synteny, homolog, ortholog, paralog

×0