---
title: "Coloc: under a single causal variant assumption"
author: "Chris Wallace"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Coloc: under a single causal variant assumption}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# Fine mapping under a single causal variant assumption
The Approximate Bayes Factor colocalisation analysis described in the next section essentially works by fine mapping each trait under a single causal variant assumption and then integrating over those two posterior distributions to calculate probabilities that those variants are shared. Of course, this means we can look at each trait on its own quite simply, and we can do that with the function `finemap.abf`.
First we load some simulated data. See [the data vignette](a02_data.html) to understand how to format your datasets.
```{r }
library(coloc)
data(coloc_test_data)
attach(coloc_test_data)
```
Then we analyse the statistics from a single study, asking about the evidence that each SNP in turn is solely causal for any association signal we see. As we might expect, that evidence is maximised at the SNP with the smallest p value
```{r }
plot_dataset(D1)
my.res <- finemap.abf(dataset=D1)
my.res[21:30,]
```
The `SNP.PP` column shows the posterior probability that exactly that SNP is causal. Note the last line in this data.frame does not correspond to a SNP, but to the null model, that no SNP is causal.
``` r
tail(my.res,3)
```
Finally, if you do have full genotype data as here, while this is a fast method for fine mapping, it can be sensible to consider multiple causal variant models too. One package that allows you to do this is [GUESSFM](https://github.com/chr1swallace/GUESSFM), described in
# (Approximate) Bayes Factor colocalisation analyses
## Introduction
The idea behind the ABF analysis is that the association of
each trait with SNPs in a region may be summarised by a vector of 0s
and at most a single 1, with the 1 indicating the causal SNP (so,
assuming a single causal SNP for each trait). The posterior
probability of each possible configuration can be calculated and so,
crucially, can the posterior probabilities that the traits share
their configurations. This allows us to estimate the support for the
following cases:
- \(H_0\): neither trait has a genetic association in the region
- \(H_1\): only trait 1 has a genetic association in the region
- \(H_2\): only trait 2 has a genetic association in the region
- \(H_3\): both traits are associated, but with different causal variants
- \(H_4\): both traits are associated and share a single causal variant
## The basic `coloc.abf` function
The function `coloc.abf` is ideally suited to the case when only
summary data are available.
```{r }
my.res <- coloc.abf(dataset1=D1,
dataset2=D2)
print(my.res)
```
Note that if you do find strong evidence for H4, we can extract the posterior probabilities for each SNP to be causal *conditional on H4 being true*. This is part of the calculation required by coloc, and contained in the column SNP.PP.H4 in the "results" element of the returned list. So we can extract the more likely causal variants by
```{r}
subset(my.res$results,SNP.PP.H4>0.01)
```
or the 95% credible set by
```{r}
o <- order(my.res$results$SNP.PP.H4,decreasing=TRUE)
cs <- cumsum(my.res$results$SNP.PP.H4[o])
w <- which(cs > 0.95)[1]
my.res$results[o,][1:w,]$snp
```