Texas Xenopus Genome Project/Species Identification
From Marcotte Lab
Selection procedure
- Download X. tropicalis mRNA sequences from XenBase (Nov. 27, 2009 version).
- xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz 17 MB, gzipped.
- Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
- xdata:ID/XENTR_CH216.fasta.gz 1.2 MB, gzipped. (CHORI-216 sequences. 160 BAC sequences from X. tropicalis genome)
- xdata:ID/XENLA_CH219.fasta.gz 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from X. laeves genome)
- Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
- xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz 1.2 MB, gzipped.
- xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz 20 MB, gzipped.
blat XENTR_CH216.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENTR_CH216.blat_pslx -out=pslx
- Parse two BLAT output files with the following criteria.
- From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
- Select X. tropicalis mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). For CHORI-219 hits, I only consider 10 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
- Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from X. tropicalis mRNA, the target sequence from CHORI-219 BACs (X. laevis) and the target sequence from CHORI-216 BACs (X. tropicalis). ONE hit block is reported.
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220| ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg ccctgggtacccctggaactatagcagggtgac >XENTR_CH216-2E23_0 tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa atc >XENLA_CH219-20I13_0 ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg ccctggttacccctggaactatagcagggtgac
- Run MUSCLE (version 4.0, with default option) for multiple sequence alignment.
$ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle
XENLA_CH219-20I1 1 + ttattt----------------------gtgccctggatacccctggaactatagcagggtgac 42 XENTR_NM_0011422 1 + ttattt----------------------gtgccctgggtacccctggaactatagcggggtgac 42 XENTR_CH216-2E23 1 + tcaccccaaatccccccctaactggccttcaggctgggcccccttag-ctcataacaaggttac 63 *.*... .....****...***.*.**...***.*..***.** XENLA_CH219-20I1 43 + tgttaccccaatgtttctatatatctgtaaccttgttattagct-aagggggcccagtctgaag 105 XENTR_NM_0011422 43 + tgttaccccaatgtttctatatatctgtaaccttgttatgggct-aagggggcccagcctgaag 105 XENTR_CH216-2E23 64 + agatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggc---- 123 .*.**.....*....*.....**.*.**.***..*.*** .... *.***..***** ..**** XENLA_CH219-20I1 106 + gtcagttagggggagatttggggtgagggcttatttg-----taccctgggtacccctggaact 164 XENTR_NM_0011422 106 + gccagttagggggggatttggggtgagtgcttatttg-----tgccctgggtacccctggaact 164 XENTR_CH216-2E23 124 + -acaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccc-cttagccc 185 * **..**.*... .*.......*.*. .*.**..** ....*****..*****....*. XENLA_CH219-20I1 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgagctaa-gggg 227 XENTR_NM_0011422 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaa-gggg 227 XENTR_CH216-2E23 186 + ataacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggt 249 ***.**.***.**.*.**.....*....*.....**.*.**.***..*.***......* ***. XENLA_CH219-20I1 228 + gcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtgccctggttaccc 291 XENTR_NM_0011422 228 + gcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtaccc 291 XENTR_CH216-2E23 250 + acccagggca---------------caaataagcact----------------------caccc 276 .***** ...***************...**..*...****** *************** .**** XENLA_CH219-20I1 292 + ctggaactatagcagggtgac 312(341) XENTR_NM_0011422 292 + ctggaactatagcagggtgac 312(341) XENTR_CH216-2E23 277 + c---------------aaatc 282(341) ****************....*