Capstone: Summarizing Clinical and Demographic Data¶
In [1]:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
In [2]:
pd.set_option('precision',2)
Load data sets from Excel¶
In [3]:
df_subject = pd.read_excel('data/capstone1/HIV Boot Camp.xls', sheetname='Patient Information')
df_hiv = pd.read_excel('data/capstone1/HIV Boot Camp.xls', sheetname='HIV RNA')
Queries on Patient Information table¶
In [4]:
df_subject.shape
Out[4]:
(3748, 10)
In [5]:
df_subject.head(n=10)
Out[5]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase |
1 | 2 | 44 | AMIND | F | No | No | 2006 | YES | No | NaN |
2 | 3 | 33 | AMIND | F | No | No | 2005 | NO | No | NaN |
3 | 4 | 60 | AMIND | F | No | Yes | 2003 | NO | Yes | Integrase |
4 | 6 | 47 | AMIND | F | No | No | 2001 | YES | Yes | Other |
5 | 7 | 54 | AMIND | F | No | No | 2000 | YES | Yes | Other |
6 | 8 | 46 | AMIND | F | No | No | 2000 | NO | No | NaN |
7 | 9 | 69 | AMIND | F | No | No | 1999 | NO | No | NaN |
8 | 10 | 41 | AMIND | F | No | No | 1999 | NO | No | NaN |
9 | 11 | 38 | AMIND | F | No | No | 1999 | NO | No | NaN |
In [6]:
df_subject.tail(n=5)
Out[6]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | |
---|---|---|---|---|---|---|---|---|---|---|
3743 | 4253 | 44 | WHITE | M | No | No | 1990 | YES | No | NaN |
3744 | 4254 | 56 | WHITE | M | Yes | Yes | 1990 | YES | No | NaN |
3745 | 4255 | 58 | WHITE | M | Yes | No | 1990 | YES | Yes | Other |
3746 | 4257 | 60 | WHITE | M | Yes | No | 1989 | YES | Yes | Other |
3747 | 4259 | 57 | WHITE | M | Yes | No | 1985 | NO | Yes | Other |
In [7]:
df_subject.sample(n=8)
Out[7]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | |
---|---|---|---|---|---|---|---|---|---|---|
595 | 662 | 52 | BLACK | F | No | Unknown | 1998 | NO | Yes | Integrase |
1509 | 1665 | 59 | BLACK | M | Yes | No | 2009 | NO | Yes | NNRTI |
3010 | 3436 | 62 | WHITE | M | Yes | Unknown | 2009 | NO | Yes | Integrase |
1143 | 1291 | 32 | BLACK | M | Yes | No | 2015 | NO | Yes | Integrase |
1190 | 1338 | 52 | BLACK | M | No | No | 2014 | NO | Yes | Integrase |
3657 | 4161 | 64 | WHITE | M | Yes | Unknown | 1994 | NO | No | NaN |
1772 | 1957 | 75 | BLACK | M | No | No | 2004 | NO | No | NaN |
3311 | 3752 | 71 | WHITE | M | Yes | No | 2001 | YES | Yes | PI |
Filtering on age¶
In [8]:
df_subject[df_subject.AGE >= 85]
Out[8]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | |
---|---|---|---|---|---|---|---|---|---|---|
202 | 218 | 89 | BLACK | F | No | No | 2007 | YES | No | NaN |
703 | 800 | 89 | BLACK | F | No | No | 1995 | NO | Yes | Other |
736 | 840 | 88 | BLACK | F | No | No | 1994 | YES | No | NaN |
755 | 869 | 85 | BLACK | F | No | No | 1993 | YES | No | NaN |
917 | 1042 | 86 | WHITE | F | No | No | 2007 | YES | No | NaN |
1983 | 2191 | 85 | BLACK | M | No | No | 2001 | NO | No | NaN |
2297 | 2569 | 87 | BLACK | M | No | No | 1998 | YES | No | NaN |
3321 | 3762 | 88 | WHITE | M | Yes | No | 2001 | YES | No | NaN |
3606 | 4102 | 85 | WHITE | M | Unknown | Unknown | 1996 | YES | No | NaN |
3645 | 4149 | 86 | WHITE | M | No | No | 1995 | YES | Yes | Other |
Split-Apply-Combine¶
In [9]:
grouped = df_subject.groupby(['RACE', 'On ART'])
df1 = grouped[['ID']].count()
df1
Out[9]:
ID | ||
---|---|---|
RACE | On ART | |
AMIND | No | 39 |
Yes | 24 | |
ASIAN | No | 3 |
Yes | 5 | |
BLACK | No | 1164 |
Yes | 1037 | |
HISPN | No | 33 |
Yes | 6 | |
OTHER | No | 94 |
Yes | 118 | |
UNKNW | No | 5 |
Yes | 16 | |
WHITE | No | 601 |
Yes | 603 |
Working with hierarchical indexes¶
In [10]:
df1.unstack(level = 1)
Out[10]:
ID | ||
---|---|---|
On ART | No | Yes |
RACE | ||
AMIND | 39 | 24 |
ASIAN | 3 | 5 |
BLACK | 1164 | 1037 |
HISPN | 33 | 6 |
OTHER | 94 | 118 |
UNKNW | 5 | 16 |
WHITE | 601 | 603 |
In [11]:
df1.unstack(level=0)
Out[11]:
ID | |||||||
---|---|---|---|---|---|---|---|
RACE | AMIND | ASIAN | BLACK | HISPN | OTHER | UNKNW | WHITE |
On ART | |||||||
No | 39 | 3 | 1164 | 33 | 94 | 5 | 601 |
Yes | 24 | 5 | 1037 | 6 | 118 | 16 | 603 |
Perhaps surprising¶
In [12]:
df_subject.groupby(['SEX', 'MSM'])[['ID']].count()
Out[12]:
ID | ||
---|---|---|
SEX | MSM | |
F | No | 1071 |
Unknown | 15 | |
Yes | 3 | |
M | No | 609 |
Unknown | 467 | |
Yes | 1583 |
Multiple summaries of single variable¶
In [13]:
grouped = df_subject.groupby(['RACE', 'SEX'])
df2 = grouped[['AGE']].agg(['count', 'mean', 'std'])
df2
Out[13]:
AGE | ||||
---|---|---|---|---|
count | mean | std | ||
RACE | SEX | |||
AMIND | F | 19 | 45.26 | 8.25 |
M | 44 | 50.34 | 9.41 | |
ASIAN | F | 2 | 34.00 | 7.07 |
M | 6 | 43.50 | 16.87 | |
BLACK | F | 770 | 51.22 | 11.16 |
M | 1431 | 49.13 | 12.78 | |
HISPN | F | 11 | 46.09 | 7.67 |
M | 28 | 49.04 | 10.26 | |
OTHER | F | 51 | 46.57 | 11.47 |
M | 161 | 46.20 | 11.21 | |
UNKNW | F | 5 | 46.40 | 12.90 |
M | 16 | 44.19 | 10.60 | |
WHITE | F | 231 | 50.66 | 10.55 |
M | 973 | 51.24 | 11.06 |
In [14]:
df2.unstack(level=1)
Out[14]:
AGE | ||||||
---|---|---|---|---|---|---|
count | mean | std | ||||
SEX | F | M | F | M | F | M |
RACE | ||||||
AMIND | 19 | 44 | 45.26 | 50.34 | 8.25 | 9.41 |
ASIAN | 2 | 6 | 34.00 | 43.50 | 7.07 | 16.87 |
BLACK | 770 | 1431 | 51.22 | 49.13 | 11.16 | 12.78 |
HISPN | 11 | 28 | 46.09 | 49.04 | 7.67 | 10.26 |
OTHER | 51 | 161 | 46.57 | 46.20 | 11.47 | 11.21 |
UNKNW | 5 | 16 | 46.40 | 44.19 | 12.90 | 10.60 |
WHITE | 231 | 973 | 50.66 | 51.24 | 10.55 | 11.06 |
Extracting one sub-group¶
In [15]:
white_female = grouped.get_group(('WHITE', 'F'))
white_female.head()
Out[15]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | |
---|---|---|---|---|---|---|---|---|---|---|
858 | 981 | 47 | WHITE | F | No | No | 2015 | NO | Yes | Integrase |
859 | 982 | 49 | WHITE | F | No | Unknown | 2015 | YES | Yes | Integrase |
860 | 983 | 59 | WHITE | F | No | No | 2015 | NO | Yes | Integrase |
861 | 984 | 51 | WHITE | F | No | No | 2015 | NO | Yes | Integrase |
862 | 985 | 57 | WHITE | F | No | Yes | 2015 | NO | Yes | NNRTI |
Merging data sets¶
In [16]:
df_hiv.head()
Out[16]:
ID | Test Month | Test Year | HIV RNA | |
---|---|---|---|---|
0 | 1 | 10 | 2011 | 64360 |
1 | 1 | 1 | 2012 | 17848 |
2 | 1 | 5 | 2012 | 28773 |
3 | 1 | 7 | 2012 | 23085 |
4 | 2 | 7 | 2005 | 5871 |
In [17]:
df_hiv.shape
Out[17]:
(12862, 4)
In [18]:
df_subject_hiv = pd.merge(df_subject, df_hiv, on='ID')
In [19]:
df_subject_hiv.shape
Out[19]:
(12877, 13)
In [20]:
df_subject_hiv.head()
Out[20]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | Test Month | Test Year | HIV RNA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 10 | 2011 | 64360 |
1 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 1 | 2012 | 17848 |
2 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 5 | 2012 | 28773 |
3 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 7 | 2012 | 23085 |
4 | 2 | 44 | AMIND | F | No | No | 2006 | YES | No | NaN | 7 | 2005 | 5871 |
Viral load for one subject¶
In [21]:
df_subject_hiv[df_subject_hiv.ID == 1]
Out[21]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | Test Month | Test Year | HIV RNA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 10 | 2011 | 64360 |
1 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 1 | 2012 | 17848 |
2 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 5 | 2012 | 28773 |
3 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 7 | 2012 | 23085 |
Who has the most time series data?¶
In [22]:
counts = df_subject_hiv.groupby(['ID'])[['HIV RNA']].count()
counts.head()
Out[22]:
HIV RNA | |
---|---|
ID | |
1 | 4 |
2 | 6 |
3 | 3 |
4 | 2 |
6 | 3 |
In [23]:
subject_id = counts[counts['HIV RNA'] == counts['HIV RNA'].max()]
subject_id
Out[23]:
HIV RNA | |
---|---|
ID | |
106 | 46 |
In [24]:
subject_id.index[0]
Out[24]:
106
In [25]:
(df_subject_hiv[df_subject_hiv.ID == subject_id.index[0]].
sort_values(['Test Year', 'Test Month']))
Out[25]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | Test Month | Test Year | HIV RNA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
421 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 1 | 1998 | 11000 |
422 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 6 | 1998 | 9203 |
423 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 9 | 1998 | 7147 |
424 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 11 | 1998 | 7355 |
425 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 1 | 1999 | 12083 |
426 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 3 | 1999 | 400 |
427 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 8 | 1999 | 1800 |
428 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 8 | 1999 | 200000 |
429 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 10 | 1999 | 810 |
430 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 12 | 1999 | 10000 |
431 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 12 | 1999 | 10000 |
432 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 1 | 2000 | 13800 |
433 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 2 | 2000 | 20000 |
434 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 2 | 2001 | 4838 |
435 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2001 | 4800 |
436 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 9 | 2001 | 16238 |
437 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 10 | 2001 | 10000 |
438 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 4 | 2002 | 19938 |
439 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2002 | 19000 |
440 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 11 | 2002 | 33947 |
441 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 11 | 2002 | 30000 |
442 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 1 | 2003 | 30000 |
443 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 6 | 2003 | 10652 |
444 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 2 | 2004 | 24488 |
445 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 4 | 2004 | 400 |
446 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 10 | 2004 | 641 |
447 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 11 | 2004 | 600 |
448 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 12 | 2004 | 1914 |
449 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 3 | 2005 | 23588 |
450 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 3 | 2005 | 23000 |
451 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 6 | 2005 | 6577 |
452 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 8 | 2005 | 1000 |
453 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 12 | 2006 | 100000 |
454 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 12 | 2006 | 100000 |
455 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 2 | 2007 | 1058 |
456 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 10 | 2007 | 55 |
457 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 11 | 2007 | 55 |
458 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 7 | 2008 | 72 |
459 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 7 | 2008 | 72 |
460 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 7 | 2009 | 521 |
461 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 7 | 2009 | 400 |
462 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2010 | 12556 |
463 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2010 | 12550 |
464 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2010 | 9561 |
465 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 5 | 2011 | 50 |
466 | 106 | 68 | BLACK | F | No | No | 2011 | YES | Yes | Other | 9 | 2011 | 49 |
Binning into age categories¶
In [26]:
df_subject_hiv['Age Grpup'] = pd.cut(df_subject_hiv['AGE'], [0, 30, 60, 105])
df_subject_hiv.groupby('Age Grpup')['HIV RNA'].agg(['mean', 'min', 'max'])
Out[26]:
mean | min | max | |
---|---|---|---|
Age Grpup | |||
(0, 30] | 168012 | 19 | 40306875 |
(30, 60] | 99641 | 1 | 21923000 |
(60, 105] | 115820 | 7 | 84545454 |
Exercises¶
1. Load the worksheet CD4 Count
in the spreadsheet
data/HIV Boot Camp.xls
into a data frame named df_cd4
.
In [27]:
df_cd4 = pd.read_excel('data/HIV Boot Camp.xls', sheetname='CD4 Count')
2. How many rows and columns are there in df_cd4?
In [28]:
df_cd4.shape
Out[28]:
(3716, 4)
3. Show 6 random rows from df_cd4.
In [29]:
df_cd4.sample(6)
Out[29]:
ID | Test Month | Test Year | CD4 Count | |
---|---|---|---|---|
2671 | 3117 | 1 | 2004 | 473 |
1477 | 1650 | 5 | 2009 | 273 |
454 | 501 | 10 | 2000 | 523 |
2310 | 2627 | 8 | 1997 | 143 |
3574 | 4102 | 1 | 1996 | 10 |
1426 | 1597 | 1 | 2010 | 154 |
4. What are the largest and smallest CD4 counts?
In [30]:
col = 'CD4 Count'
df_cd4[col].min(), df_cd4[col].max()
Out[30]:
(0, 3900)
5. What were the largest and smallest CD4 counts for tests in Dec 2010?
In [31]:
df1 = df_cd4[(df_cd4['Test Month'] == 12) & (df_hiv['Test Year']==2010)]
df1[col].min(), df1[col].max()
Out[31]:
(9, 900)
In [32]:
df2 = df_cd4.groupby(['Test Year', 'Test Month']).get_group((2010, 12))
df2[col].min(), df2[col].max()
Out[32]:
(165, 782)
6. Sort df_cd4 first by ID (ascending), then by Test Year (descending), then by Test Month (ascending). Show the first 6 rows before and after sorting.
In [33]:
df_cd4.head()
Out[33]:
ID | Test Month | Test Year | CD4 Count | |
---|---|---|---|---|
0 | 1 | 1 | 2012 | 211 |
1 | 2 | 7 | 2005 | 388 |
2 | 3 | 3 | 2005 | 371 |
3 | 4 | 7 | 2003 | 392 |
4 | 6 | 6 | 2001 | 35 |
In [34]:
df_cd4.sort_values(['ID', 'Test Year', 'Test Month'], ascending=[True, False, True]).head()
Out[34]:
ID | Test Month | Test Year | CD4 Count | |
---|---|---|---|---|
0 | 1 | 1 | 2012 | 211 |
1 | 2 | 7 | 2005 | 388 |
2 | 3 | 3 | 2005 | 371 |
3 | 4 | 7 | 2003 | 392 |
4 | 6 | 6 | 2001 | 35 |
7. Create a new column CD4 Group
that categorizes CD4 counts
into the ranges (0-199, 200-500, 500-4000).
In [35]:
df_cd4['CD4 Group'] = pd.cut(df_cd4[col], bins=[0,200,500,4000], right=False)
df_cd4.head()
Out[35]:
ID | Test Month | Test Year | CD4 Count | CD4 Group | |
---|---|---|---|---|---|
0 | 1 | 1 | 2012 | 211 | [200, 500) |
1 | 2 | 7 | 2005 | 388 | [200, 500) |
2 | 3 | 3 | 2005 | 371 | [200, 500) |
3 | 4 | 7 | 2003 | 392 | [200, 500) |
4 | 6 | 6 | 2001 | 35 | [0, 200) |
8. Find the mean and median CD4 count for each subject. Show these values for subjects with IDs in (10, 25, 30, 40, 50, 60).
In [36]:
grouped = df_cd4.groupby('ID')
summary = grouped[col].agg(['mean', 'median'])
summary.ix[[10,25,30,40,50,60]]
Out[36]:
mean | median | |
---|---|---|
ID | ||
10 | 798 | 798 |
25 | 283 | 283 |
30 | 876 | 876 |
40 | 1117 | 1117 |
50 | 510 | 510 |
60 | 763 | 763 |
9. Merge df_subject
with df_cd4
by joining on ID
to
create a new data frame df_subject_cd4. Show the first 6 rows.
In [37]:
df_subject_cd4 = pd.merge(df_subject, df_cd4, on='ID')
df_subject_cd4.head()
Out[37]:
ID | AGE | RACE | SEX | MSM | IDU | UNC Entry to Care Year | Prior HIV Care Elsewhere | On ART | ART Type | Test Month | Test Year | CD4 Count | CD4 Group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 38 | AMIND | F | No | No | 2012 | NO | Yes | Integrase | 1 | 2012 | 211 | [200, 500) |
1 | 2 | 44 | AMIND | F | No | No | 2006 | YES | No | NaN | 7 | 2005 | 388 | [200, 500) |
2 | 3 | 33 | AMIND | F | No | No | 2005 | NO | No | NaN | 3 | 2005 | 371 | [200, 500) |
3 | 4 | 60 | AMIND | F | No | Yes | 2003 | NO | Yes | Integrase | 7 | 2003 | 392 | [200, 500) |
4 | 6 | 47 | AMIND | F | No | No | 2001 | YES | Yes | Other | 6 | 2001 | 35 | [0, 200) |
10. Find the mean, minimum and maximum CD4 count for subgroups differing in RACE AND SEX.
In [38]:
grouped = df_subject_cd4.groupby(['RACE', 'SEX'])
grouped[col].agg(['mean', 'min', 'max'])
Out[38]:
mean | min | max | ||
---|---|---|---|---|
RACE | SEX | |||
AMIND | F | 298.53 | 10 | 798 |
M | 361.50 | 7 | 871 | |
ASIAN | F | 331.50 | 283 | 380 |
M | 326.33 | 4 | 688 | |
BLACK | F | 372.72 | 3 | 3900 |
M | 362.36 | 0 | 2000 | |
HISPN | F | 359.82 | 29 | 812 |
M | 373.33 | 14 | 1146 | |
OTHER | F | 372.04 | 4 | 1062 |
M | 361.09 | 7 | 1434 | |
UNKNW | F | 473.80 | 296 | 660 |
M | 448.69 | 14 | 848 | |
WHITE | F | 349.27 | 9 | 1559 |
M | 365.61 | 1 | 1817 |
Sneak Preview: Visualizing data¶
In [39]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
In [40]:
g = sns.factorplot(data=df_subject_hiv, x='On ART', y='HIV RNA',
col='RACE', col_wrap=3, kind='box', hue='SEX',
palette='Oranges')
g.axes[0].set_yscale('log')
pass