Texas Xenopus Genome Project/Species Identification
From Marcotte Lab
Target gene
- pfas (phosphoribosylformylglycinamidine synthase). PREDICTED from human Entrez gene.
Selection procedure
- Download X. tropicalis mRNA sequences from XenBase (Nov. 27, 2009 version).
- xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz 17 MB, gzipped.
- Download CHORI-219 sequences (from NCBI GenBank).
- xdata:ID/XENLA_CH219.fasta.gz 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from X. laeves genome)
- Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
- xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz 1.2 MB, gzipped.
blat XENLA_CH219.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENLA_CH219.blat_pslx -out=pslx
- Parse two BLAT output files with the following criteria.
- From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
- Select X. tropicalis mRNA sequences which hit both CHORI-219 (minimum match length is 200 bp to be called as a 'hit'). I only consider 10 CHORI-219 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
- Survey each hit blocks. If the hit block is less than 200 bp, discard it. 42 hit blocks from 8 mRNAs are selected.
- NM_001004837 Unnamed, predicted gene MGC69309 NCBIXenBase
- NM_001007499 paired-like homeodomain 1 (pitx-1) NCBIXenBase
- NM_001011405 Homeobox A5 (hoxa5) NCBIXebBase
- NM_001035121 CCAAT/enhancer binding protein (C/EBP), beta (cebpb) NCBIXenBase
- NM_001113032 LY6/PLAUR domain containing 6 (lypd6) NCBIXenBase
- NM_001127429 homeobox A3 (hoxa3) NCBIXenBase
- NM_001129937 SRY (sex determining region Y)-box 18 (sox18) NCBI XenBase
- NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) NCBIXenBase
- Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Here's the example of Self replication.
- hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
- pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
- hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
$ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle
>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032| 3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66 ||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||| 138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201 67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114 ||||||||||||||||||||||||||||||||| ||||||||||||| 202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249