Python: Classes and Object-Oriented Design

Python supports object-oriented programming via classes. Objects are instances of classes, and they can possess both data (known as attributes) and class-specific functions (known as methods). Classes can therefore be used to model real world structures that have properties and behaviors.

Self

Class methods have a special first parameter, traditionally called self, that refers to the instance itself. Basically, when we have an instance of a class, say a = A(), and we call an instance method, a.func(x, y, z), the Python interpreter executes A.func(a, x, y, z). Hence there is an extra first argument that is a placeholder i.e., self, referring to the instance calling the method. Examples below will make this concept clearer.

Everything in Python is a class

Even a number is a class

We can show all the attributes and methods with the built-in dir function. Notice that there are special attributes that begin and end with double underscores, and regular ones that do not. We will learn more about special methods later.

In [1]:
', '.join(dir(2))
Out[1]:
'__abs__, __add__, __and__, __bool__, __ceil__, __class__, __delattr__, __dir__, __divmod__, __doc__, __eq__, __float__, __floor__, __floordiv__, __format__, __ge__, __getattribute__, __getnewargs__, __gt__, __hash__, __index__, __init__, __init_subclass__, __int__, __invert__, __le__, __lshift__, __lt__, __mod__, __mul__, __ne__, __neg__, __new__, __or__, __pos__, __pow__, __radd__, __rand__, __rdivmod__, __reduce__, __reduce_ex__, __repr__, __rfloordiv__, __rlshift__, __rmod__, __rmul__, __ror__, __round__, __rpow__, __rrshift__, __rshift__, __rsub__, __rtruediv__, __rxor__, __setattr__, __sizeof__, __str__, __sub__, __subclasshook__, __truediv__, __trunc__, __xor__, bit_length, conjugate, denominator, from_bytes, imag, numerator, real, to_bytes'

Define a class

In [2]:
class A:
    """Describe class."""
    pass

Create an instance of the class

In [3]:
a = A()
In [4]:
print(a)
<__main__.A object at 0x104999710>

Special and regular methods

In [5]:
class B:
    """Describe the class."""
    def __init__(self, val):
        """Class initizlizer."""
        self.val = val

    def __str__(self):
        """String reprresntation of class instance."""
        return "{}: {}".format(self.__class__.__name__, self.val)

    def add(self, val):
        """Add val to self.val"""
        self.val += val

This uses the __init__ special method

In [6]:
b = B('hello')

This uses the __str__ special method

In [7]:
print(b)
B: hello
This uses the `add` regular method.
In [8]:
b.add(' world')
In [9]:
print(b)
B: hello world

Inheritance

In [10]:
class C(B):
    def __init__(self, val, name='C'):
        super().__init__(val)
        self.name = name

    def __str__(self):
        """String reprresntation of class instance."""
        return "{}: {}".format(self.name, self.val)

This uses the __init__ method of C. Note that assignment of val is delegated to the parent class.

In [11]:
c = C('hello', 'Rumpelstiltskin')

This uses the add regular method of B, since C has no add method defined.

In [12]:
c.add(' Rapunzel')

This uses the __str__ special method of C, not B.

In [13]:
print(c)
Rumpelstiltskin: hello Rapunzel

Extended Example

We will design a class to store and manipulate biological sequence data. This class has fairly rich functionality using some advanced Python features even though it consists of only a few lines of code.

In [14]:
from collections.abc import Mapping

class BioSequence(Mapping):
    """A class that contains one or more biological sequences."""

    def __init__(self, fasta):
        """Construcror from a FASTA format string."""
        chunks = [x for x in fasta.strip().split('>') if x]
        names = []
        seqs = []
        for chunk in chunks:
            lines = chunk.splitlines()
            names.append(lines[0].strip())
            seqs.append(''.join(lines[1:]))
        self._data = dict(zip(names, seqs))

    def __getitem__(self, key):
        return self._data[key]

    def __iter__(self):
        return iter(self._data)

    def __len__(self):
        return len(self._data)

    def __str__(self):
        return str(self._data)
In [15]:
short = ['Ala', 'Arg', 'Asn', 'Asp', 'Cys', 'Glu', 'Gln',
     'Gly', 'His', 'Ile', 'Leu', 'Lys', 'Met', 'Phe',
     'Pro', 'Ser', 'Thr', 'Trp', 'Tyr', 'Val',
     'ANY', 'GAP', 'STP']

letters = ['A', 'R', 'N', 'D', 'C', 'E', 'Q', 'G', 'H',
           'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W',
           'Y', 'V', 'X', '-', '*']

class PeptideSequence(BioSequence):
    """Specialized class for peptide seequences."""

    def __init__(self, fasta):
        super().__init__(fasta)
        self.mapper = dict(zip(letters, short))

    def __str__(self):
        s = []
        for k, v in self._data.items():
            s.append(k)
            s.append('-'.join(self.mapper[c] for c in v))
            s.append('')
        return '\n'.join(s)

class DNASequence(BioSequence):
    """Specialized class for DNA seequences."""

    def __init__(self, fasta):
        super().__init__(fasta)
        self.table = str.maketrans('ACTG', 'TGAC')

    def reverse_complement(self):
        return {k: v.translate(self.table)[::-1] for
             k, v in self._data.items()}

Creating a class instance for a peptide sequence

In [16]:
peptide_fasta = '''>MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID
FPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREA
DIDGDGQVNYEEFVQMMTAK*

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
IENY'''
In [17]:
peptides = PeptideSequence(peptide_fasta)
In [18]:
len(peptides)
Out[18]:
2
In [19]:
list(peptides.keys())
Out[19]:
['MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken',
 'gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]']
In [20]:
peptides['MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken']
Out[20]:
'ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*'
In [21]:
for k, v in peptides.items():
    print(k, v)
    print()
MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK*

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGXIENY

In [22]:
print(peptides)
MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken
Ala-Asp-Gln-Leu-Thr-Glu-Glu-Gln-Ile-Ala-Glu-Phe-Lys-Glu-Ala-Phe-Ser-Leu-Phe-Asp-Lys-Asp-Gly-Asp-Gly-Thr-Ile-Thr-Thr-Lys-Glu-Leu-Gly-Thr-Val-Met-Arg-Ser-Leu-Gly-Gln-Asn-Pro-Thr-Glu-Ala-Glu-Leu-Gln-Asp-Met-Ile-Asn-Glu-Val-Asp-Ala-Asp-Gly-Asn-Gly-Thr-Ile-Asp-Phe-Pro-Glu-Phe-Leu-Thr-Met-Met-Ala-Arg-Lys-Met-Lys-Asp-Thr-Asp-Ser-Glu-Glu-Glu-Ile-Arg-Glu-Ala-Phe-Arg-Val-Phe-Asp-Lys-Asp-Gly-Asn-Gly-Tyr-Ile-Ser-Ala-Ala-Glu-Leu-Arg-His-Val-Met-Thr-Asn-Leu-Gly-Glu-Lys-Leu-Thr-Asp-Glu-Glu-Val-Asp-Glu-Met-Ile-Arg-Glu-Ala-Asp-Ile-Asp-Gly-Asp-Gly-Gln-Val-Asn-Tyr-Glu-Glu-Phe-Val-Gln-Met-Met-Thr-Ala-Lys-STP

gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
Leu-Cys-Leu-Tyr-Thr-His-Ile-Gly-Arg-Asn-Ile-Tyr-Tyr-Gly-Ser-Tyr-Leu-Tyr-Ser-Glu-Thr-Trp-Asn-Thr-Gly-Ile-Met-Leu-Leu-Leu-Ile-Thr-Met-Ala-Thr-Ala-Phe-Met-Gly-Tyr-Val-Leu-Pro-Trp-Gly-Gln-Met-Ser-Phe-Trp-Gly-Ala-Thr-Val-Ile-Thr-Asn-Leu-Phe-Ser-Ala-Ile-Pro-Tyr-Ile-Gly-Thr-Asn-Leu-Val-Glu-Trp-Ile-Trp-Gly-Gly-Phe-Ser-Val-Asp-Lys-Ala-Thr-Leu-Asn-Arg-Phe-Phe-Ala-Phe-His-Phe-Ile-Leu-Pro-Phe-Thr-Met-Val-Ala-Leu-Ala-Gly-Val-His-Leu-Thr-Phe-Leu-His-Glu-Thr-Gly-Ser-Asn-Asn-Pro-Leu-Gly-Leu-Thr-Ser-Asp-Ser-Asp-Lys-Ile-Pro-Phe-His-Pro-Tyr-Tyr-Thr-Ile-Lys-Asp-Phe-Leu-Gly-Leu-Leu-Ile-Leu-Ile-Leu-Leu-Leu-Leu-Leu-Leu-Ala-Leu-Leu-Ser-Pro-Asp-Met-Leu-Gly-Asp-Pro-Asp-Asn-His-Met-Pro-Ala-Asp-Pro-Leu-Asn-Thr-Pro-Leu-His-Ile-Lys-Pro-Glu-Trp-Tyr-Phe-Leu-Phe-Ala-Tyr-Ala-Ile-Leu-Arg-Ser-Val-Pro-Asn-Lys-Leu-Gly-Gly-Val-Leu-Ala-Leu-Phe-Leu-Ser-Ile-Val-Ile-Leu-Gly-Leu-Met-Pro-Phe-Leu-His-Thr-Ser-Lys-His-Arg-Ser-Met-Met-Leu-Arg-Pro-Leu-Ser-Gln-Ala-Leu-Phe-Trp-Thr-Leu-Thr-Met-Asp-Leu-Leu-Thr-Leu-Thr-Trp-Ile-Gly-Ser-Gln-Pro-Val-Glu-Tyr-Pro-Tyr-Thr-Ile-Ile-Gly-Gln-Met-Ala-Ser-Ile-Leu-Tyr-Phe-Ser-Ile-Ile-Leu-Ala-Phe-Leu-Pro-Ile-Ala-Gly-ANY-Ile-Glu-Asn-Tyr

Creating a class instance for a DNA sequence

In [23]:
dna_fasta ='''
>SRR001666_1.
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA

>SRR001666_2
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
AGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
'''
In [24]:
dnas = DNASequence(dna_fasta)
In [25]:
print(dnas)
{'SRR001666_1.': 'GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCGTTCAGGGATACGACGTTTGTATTTTAAGAATCTGA', 'SRR001666_2': 'AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT'}
In [26]:
dnas.reverse_complement()
Out[26]:
{'SRR001666_1.': 'TCAGATTCTTAAAATACAAACGTCGTATCCCTGAACGGTGGGATTTGACGCCATCGGCAGCGGCCATCACCC',
 'SRR001666_2': 'ATGATAAAACGACGCGTATTATCATCGACTTCTGCTTCTATTTGAAAACCCTTAAGTTGTTAAGGGTAACTT'}