.. reproducible research tutorial file, created by ARichards Algorithm implementation ======================== The manuscript proposes the idea of using the `f-measure `_ to draw positivity thresholds in flow cytometry. Positivity thresholds are needed in cases such as with intracellular staining (ICS) for `cytokines `_ and the f-measure overcomes the subjectivity of traditional methods that require manual intervention. The f-measure or f-score can be defined in terms of true positives (TP), false positives (FP) and false negatives (FN). We can express these measures in terms of `precision and recall `_ where precision is defined as :math:`\frac{\textrm{TP}}{\textrm{TP+FP}}` and recall as :math:`\frac{\textrm{TP}}{\textrm{TP+FN}}`. We can then write the f-score as .. math:: (1 + \beta^{2}) \times \frac{(\textnormal{precision} \times \textnormal{recall})}{(\beta^{2} \times \textnormal{precision}) + \textnormal{recall}}. The following are the functions that are used in the :doc:`AnExample`: List of functions ------------------ * :ref:`get_positivity_threshold` * :ref:`calculate_fscore` * :ref:`get_cytokine_positive_events` For your convenience the following functions are available as a Python library :download:`PositivityLib.py ` The following package imports need to be made. >>> import os, sys >>> import numpy as np .. _get_positivity_threshold: Get positivity threshold ------------------------ In order to calculate the f-score the `probability density functions `_ (pdf) are found using histogram representations of the data. The number of bins *numBins* controls how smoothly the pdf fits the actual distribution of events. .. code-block:: python def get_positivity_threshold(neg,pos,channelIndex,beta=0.8,theta=2.0, width=10, numBins=None): ''' In order to calculate the f-score the pdfs are found using histogram representations of the data. The number of bins numBins controls how smoothly the pdf fits the actual distribution of events. ''' def move_mean(x, window): xs = np.cumsum(x) x1 = xs[(window-1):] x2 = np.concatenate([[0], xs[:-window]]) return np.concatenate([[np.nan]*(window-1), (x1-x2)/float(window)]) neg,pos = neg[:,channelIndex].copy(),pos[:,channelIndex].copy() if numBins == None: numBins = int(np.sqrt(np.max([neg.shape[0],pos.shape[0]]))) pdfNeg, bins = np.histogram(neg, bins=numBins, normed=True) pdfPos, bins = np.histogram(pos, bins=bins, normed=True) pdfNeg = move_mean(pdfNeg, window=width) pdfPos = move_mean(pdfPos, window=width) xs = (bins[:-1]+bins[1:])/2.0 fscores,precision,recall = calculate_fscores(pdfNeg,pdfPos,beta=beta,theta=theta) fThreshold = xs[np.argmax(fscores)] return {'threshold':fThreshold, 'fscores':fscores,'pdfx':xs,'pdfpos':pdfPos,'pdfneg':pdfNeg, 'precision':precision,'recall':recall} .. _calculate_fscore: Calculate fscore ---------------- The f-score is calculated as show in Eqn :eq:`fscore`. The TP, FP an FN are estimated from the overlayed pdfs that represent the positive and negative event distributions. .. code-block:: python def calculate_fscores(neg_pdf,pos_pdf,beta=0.2,theta=3.0): n = len(neg_pdf) fpos = np.where(pos_pdf > theta*neg_pdf, pos_pdf-neg_pdf, 0) tp = np.array([np.sum(fpos[i:]) for i in range(n)]) fn = np.array([np.sum(fpos[:i]) for i in range(n)]) fp = np.array([np.sum(neg_pdf[i:]) for i in range(n)]) precision = tp/(tp+fp) precision[tp==0]=0 recall = tp/(tp+fn) recall[recall==0]=0 fscores = (1+beta*beta)*(precision*recall)/(beta*beta*precision + recall) fscores[np.where(np.isnan(fscores)==True)[0]]=0 return fscores,precision,recall .. _get_cytokine_positive_events: Get cytokine positive events ---------------------------- Using the fscore threshold we next use the following function to return the percentages, counts and indices of cytokine positive events .. code-block:: python def get_cytokine_positive_events(cytoIndex,fThreshold,fileList,fcsList): percentages = {} counts = {} idx = {} filterInds = np.array([]) for fileName in fileList: events = get_events(fileName,fileList,fcsList) data = events[:,cytoIndex] positiveEventInds = np.where(data > fThreshold)[0] if events.shape[0] == 0 or len(positiveEventInds) == 0: percentages[fileName] = 0.0 else: percentages[fileName] = (float(positiveEventInds.size)/float(events.shape[0])) * 100.0 counts[fileName] = positiveEventInds.size idx[fileName] = positiveEventInds return percentages, counts, idx