Identification and utilization of arbitrary correlations in models of recombination signal sequences

Lindsay G. Cowell, Marco Davila, Thomas B. Kepler and Garnett Kelsoe
2002. Genome Biology 3:research0072.1 - research0072.20.

A pdf of the manuscript is available for download from Genome Biology.

Please email any questions to:

Accession Numbers
All RSS sequences were obtained from genbank files containing Ig or Tcr gene segments and flanking DNA. The accession numbers are listed in this text file.
RSS Data Set
The 12- and 23-RSS sequences are available in fasta formatted files: 12-RSS, 23-RSS.
Perl Programs
The following programs are written in PERL and compile and execute properly under Red Hat Linux. Upon execution of the programs you will be prompted to enter an input and an output file name. All input files must be in fasta format. Please press and hold the shift key while clicking on the prgram links!

The program that scores 12-RSS requires the 12-RSS data set listed above as input. Similarly, the 23-RSS data set is required as input to the 23-RSS program. The programs automatically read in the correct file. When prompted for an input file name, it is only necessary to type the name of the file containing the sequences you would like to have scored. These sequences should be the correct length (i.e. either 28 or 39 base pairs) and should be listed with the heptamer 5' of the nonamer, i.e. CACAGTG ... ACAAAAACC instead of GGTTTTTGT ... CACTGTG. The program InverseComplement.script will write out the inverse complements of any sequences read in as input.