R-TUTORIAL ON COX REGRESSION This tutorial is based on Practical exercise 7, STK4080 2014 In the tutorial we will use the data on causes of death and mortality in three Norwegian counties described in example 1.12 in ABG. The data are also described at the "Service page for ABG (data sets etc.)" which is found under Computing in the present course webpages. The data are used for illustration in examples 3.3, 3.15, 4.1, and 4.4 of ABG. The data may be read into R by the command: norw.death=read.table("https://www.mn.uio.no/math/english/people/aca/borgan/data/causes_death.txt", header=T) We disregard the 71 individuals who smoke pipe or cigar, so we give the command: norw.death=norw.death[norw.death$smkgr!=6,] The first six lines of the data matrix are as follows: > head(norw.death) agestart agestop dead dead1 dead2 dead3 dead4 sex county sbp bmi smkstart smkgr 1 40.00 60.80 0 0 0 0 0 2 14 110 21.8 NA 1 2 44.43 57.65 1 0 0 1 0 2 14 120 30.4 NA 1 3 40.00 60.38 0 0 0 0 0 2 5 156 28.1 NA 1 4 41.11 66.29 0 0 0 0 0 2 14 130 24.9 26 2 5 47.06 65.97 1 0 1 0 0 2 14 148 30.1 32 4 6 48.51 70.00 0 0 0 0 0 2 14 154 23.5 NA 1 In the tutorial we will focus on total mortality and not on the specific death causes. Example 4.1 in ABGgives results of a Cox regression for total mortality using sex and smoking habit as covariates. You obtain the results of Example 4.1 by the commands: fit0=coxph(Surv(agestart,agestop,dead)~factor(sex)+factor(smkgr),data=norw.death) summary(fit0) Note that when using age as the time parameter, we have left truncated data. Thus, in the coxph-command, we need to include agestart and agestop, since the individuals are not all followed from the same age. (See also the R-tutorial on left truncation). ---- We next want to study the effect of systolic blood pressure and body mass index. Perform a Cox regression model with the covariates sex, smoking habit, systolic blood pressure, and body mass index. Interpret the estimated regression coefficients. fit.b=coxph(Surv(agestart,agestop,dead)~factor(sex)+factor(smkgr)+sbp+bmi, data=norw.death) summary(fit.b) You will find a clear effect of sbp, but not of bmi. Further the effect of sex is slightly reduced, while the effects of smoking are slightly increased when including sbp and bmi. ------ We will now check if there are interactions between some of the covariates. For example, the following command can be used to check for interaction between sex and smoking habits: fit.ss=coxph(Surv(agestart,agestop,dead)~factor(sex)+factor(smkgr)+sbp+bmi+factor(sex):factor(smkgr), data=norw.death) summary(fit.s) The above summary indicates that the interactions are not significant. This may even better be seen by a likelihood ratio test which compares the models behind, respectively, fit.b and fit.ss > anova(fit.b,fit.ss) Analysis of Deviance Table Cox model: response is Surv(agestart, agestop, dead) Model 1: ~ factor(sex) + factor(smkgr) + sbp + bmi Model 2: ~ factor(sex) + factor(smkgr) + sbp + bmi + factor(sex):factor(smkgr) loglik Chisq Df P(>|Chi|) 1 -4221.5 2 -4219.1 4.8586 4 0.3021 The increase in 2*log likelihood is hence 4.8586 with 4 new parameters which gives a p-value 0.3021. Similarly, we may check for interaction between, for example, - sex and sbp - sex and bmi - smoking and sbp - smoking and bmi - sbp and bmi It turns out than none of these interactions are significant at the 5% level (but the one for sbp and bmi is close with a p-value of 5.1%) ----- Example 4.4 in ABG considers the model fit0 above with sex and smkgr as the covariates. They then present estimated survival functions for this model for men and women who have never smoked, and men and women who smoke 20 or more cigarettes per day. We can obtain the plot in Figure 4.3 by the commands: new.cov=data.frame(sex=c(1,1,2,2),smkgr=c(1,5,1,5)) surv0=survfit(fit0,newdata=new.cov) plot(surv0,mark.time=F,xlim=c(40,70),xlab="Age",ylab="Survival",lty=c(1,1,2,2)) ----- We may produce similar plots for other combinations of covariates and covariate values. The R-code below does this separately for men and women, for each sex considering six combinations of covariates, and fixing bmi at 25: 1) smkgr=1 (never smoked), sbp=120 2) smkgr=1 (never smoked), sbp=150 3) smkgr=3 (1-9 cigarettes per day), sbp=120 4) smkgr=3 (1-9 cigarettes per day), sbp=150 5) smkgr=5 (20+ cigarettes per day), sbp=120 6) smkgr=5 (20+ cigarettes per day), sbp=150 par(mfrow=c(1,2)) new.cov.males=data.frame(sex=rep(1,6),smkgr=rep(c(1,3,5),each=2), sbp=rep(c(120,150),3),bmi=rep(25,6)) surv.males=survfit(fit.b,newdata=new.cov.males) plot(surv.males,mark.time=F,xlim=c(40,70),xlab="Age", ylab="Survival", lty=rep(1:2,3),col=rep(c("green","blue","red"),each=2), main="Males") legend("bottomleft",as.character(1:6),lty=rep(1:2,3), col=rep(c("green","blue","red"),each=2)) new.cov.females=data.frame(sex=rep(2,6),smkgr=rep(c(1,3,5),each=2), sbp=rep(c(120,150),3),bmi=rep(25,6)) surv.females=survfit(fit.b,newdata=new.cov.females) plot(surv.females,mark.time=F,xlim=c(40,70),xlab="Age", ylab="Survival", lty=rep(1:2,3),col=rep(c("green","blue","red"),each=2), main="Females") legend("bottomleft",as.character(1:6),lty=rep(1:2,3), col=rep(c("green","blue","red"),each=2))