R/qtl Frequently Asked Questions

[ Home | Download | Readme | Status | Bugs | Sample graphics | Sample data | Tutorial | Help pages: ( html | pdf ) | Citation ]


If you have questions, suggestions, corrections, etc., please email Karl Broman (kbroman at biostat.wisc.edu).


  1. I'm having trouble reading my data into R.

    First, take a look at the help file for the read.cross function. Next, look at some of the sample data files.

    If you are still having trouble, send an email to Karl Broman (kbroman at biostat.wisc.edu), attaching a copy of your data. He's had little trouble, up to now, providing assistance with such problems, and will keep your data confidential.

  2. Are you planning to implement                   ?

    The list of things we hope to implement in R/qtl is available here. Many items on the list will not be tackled for quite some time.

    If you have extensions to R/qtl that may be of general use, send an email to Karl Broman (kbroman at biostat.wisc.edu); he will be happy to discuss incorporating your code into the package.

  3. I'm getting the following error/warning; what does it mean?

    We apologize that some warnings and error messages are not very easy to understand. For the same reason, they are seldom simple to diagnose without more information.

    Send an email to Karl Broman (kbroman at biostat.wisc.edu), including the code that led to the problem, and ideally also the primary data. It will also be useful to include information on your operating system and the versions of R and R/qtl that you are using. Your versions of R and R/qtl may be determined by typing the following.

    > version
    > qtlversion()

  4. I'm running out of memory; what should I do?

    In Windows, by default you get 1 Gb memory (or the amount of RAM you have on your computer, if that is less that 1 Gb). If you have 2 Gb RAM, you need to use the command-line flag --max-mem-size to have access to the additional memory.

    Right-click on the R icon that you use to start R and select "Properties". Then select the tab "Shortcut" and modify the "Target" to include something like --max-mem-size=2G.

    Alternatively, you can change the memory limit within R using the memory.limit function, giving a new limit in Mb. (For example, type memory.limit(2048) to change the memory limit to 2 Gb.)

    See also the R for Windows FAQ and, within R, type ?Memory and ?memory.size.

  5. I'm still running out of memory; what should I do?

    Of course, one is limited by the memory available on one's computer, and so there are not many options.

    First, clean up your workspace, removing objects that aren't important to you. You can save objects to disk with the save command.

    The multiple imputation method, as implemented, uses a particularly large amount of memory. Consider using a small number of imputations (n.draws) or a coarser grid (step) in sim.geno, or focusing on a subset of the chromosomes.

  6. I'm considering buying a computer for QTL analyses; do you have any suggestions?

    We recommend purchasing a computer with as much memory (RAM) as possible: preferably at least 2 Gb. And of course, the faster the processor, the better.

  7. Does R/qtl support multiple computer processors?

    R currently can deal with just one processor at a time. However, if you have a computer with multiple processors, you can speed up permutation tests and simulations by spawning multiple instances of R at once. We routinely make use of the multiple processors on a linux cluster for more rapid permutation tests.

    If a permutation test is to be split across multiple processors, it is important to ensure that the random number seeds are set to be different for the different jobs, using the function set.seed. Otherwise, the multiple jobs may give precisely the same results.

  8. How do I change R's working directory?

    Within R, use the functions getwd to determine the current working directory, setwd to change the current working directory, and dir to list the files in the current working directory.

    To change R's default working directory in Windows, create a shortcut to the R GUI (there may already be one on your desktop) and then do the following:

    1. Right-click on the shortcut.
    2. Select the tab "Shortcut".
    3. Change "Start in" to the desired working directory.

    To change R's default working directory on a Mac, start R and then select (on the menu bar) R -> Preferences -> Startup, and then change the "Initial working directory".

  9. Can one analyze recombinant inbred line (RIL) data?

    It is possible, but it is not yet documented.

    Read in your data as if it were a backcross, and then type one of the following, according to whether your RIL were generated by selfing or sibling mating (I assume that your data is in the object myx.)

    > class(myx)[1] <- "riself"
    > class(myx)[1] <- "risib"

    The data are treated essentially like a backcross, but the map is expanded before calculating QTL genotype probabilities and so forth. Note that we currently can deal only with strain averages as phenotypes.

  10. Can one analyze outcross data?

    Generally, no. R/qtl does include facilities for analysis of a phase-known four-way cross, generally derived from a cross between four inbred strains, with all progeny from a cross of the form (A × B) × (C × D), with females listed first. See the help file for the read.cross function for details about the coding of the genotype data.

  11. Can one analyze data on half-sib families?

    No.

  12. Can one analyze advanced intercross lines (AIL)?

    R/qtl has no special facilities for dealing with advanced intercross lines. One might analyze such data as if they were from an intercross, though with an expanded genetic map, but it is important to take account of the relationships among individuals (for example, the sibships in the final generation), and R/qtl is not currently able to do that.

  13. Can one do a genome scan with a dominant, recessive, or additive allele model?

    No. In the analysis of intercross data, we always consider the full model (allowing the three genotypes to have different phenotype averages). One may inspect the results of effectplot to assess whether a locus appears to be dominant or additive.

  14. Can one test if an allele is associated with an increase in phenotype?

    No, though one may inspect the results of effectplot which may suggest such an effect. We see little value in a formal significance test.

  15. How can I estimate the heritability due to a QTL?

    One may use fitqtl to fit a multiple-QTL model and estimate the percent phenotypic variance explained by each QTL.

    In the context of a single-QTL model, the heritability due to a QTL may be estimated by 1 – 10-2 LOD / n, where n is the sample size and LOD is the LOD score (from scanone)

  16. How many permutations should I run?

    We generally use 1000 permutation replicates, though we may use 10,000 or 100,000, if we want more precise results.

    In general, we view the permutation test as a method for estimating a p-value. Suppose that the true p-value (if one performed all possible permutations) is p, we use n permutation replicates, and x is the number of replicates giving a LOD score greater or equal to that observed. Then x follows a binomial(n, p) distribution. Our estimate of the p-value is x/n, and this has standard error (SE) = √[p(1–p)/n].

    If one wishes the SE of the estimated p-value to be ∼0.001 in the case that p ≈ 0.05, one would need 0.05 × 0.95 / 0.0012 = 47,500 permutation replicates.

  17. Can one calculate the individual contributions to the LOD score?

    No.

  18. Can I analyze a monogenic (Mendelian) trait?

    Yes. Use model="binary" in scanone or scantwo. Alternatively, created a dummy marker with the genotypes encoding the phenotypes, and use est.rf to calculate LOD scores for linkage between each typed marker and the phenotype.

  19. I have genotype data only on affected individuals. Can these data be analyzed with R/qtl?

    Currently, the analysis of a binary phenotype in R/qtl requires genotype data on both affected and unaffected individuals. In the case that genotype data are available only on affected individuals, one may use geno.table to identify loci that exhibit segregation distortion and so are indicated to be potentially linked to a disease susceptibility locus. Such evidence should be confirmed by further genotyping unaffected individuals.

  20. How can I pick out multiple peaks on a chromosome from the output of scanone?

    It is best not to rely on the results of scanone to infer the presence of multiple linked QTL. Instead, one should consider the results of a two-dimensional, two-QTL scan (with scantwo) or multiple QTL analysis (with fitqtl and/or scanqtl).

    Nevertheless, if there are a couple of peaks on a chromosome, and one wishes to identify the location of the second peak, one can subset the results from scanone to find the location of the second peak. For example, if out contains the output from scanone, and one wishes to find the location for the peak on chromosome 1 that is distal to 50 cM on the genetic map, one may use code like the following.

    > max(out[out$chr==1 & out$pos > 50,])

  21. How can one remove partially informative genotypes (C or D) from a data set?

    Use the function strip.partials.

  22. How can one investigate possible interactions between a specific locus and the rest of the genome?

    The simplest approach is to consider a marker (preferably one with complete genotype data) near the position of interest, and perform a genome scan with that marker as first an additive and then an interactive covariate. The difference between the two sets of LOD scores concern evidence for interaction with the marker position.

    Alternatively, one can inspect the results of a two-dimensional, two-QTL genome scan, obtained by scantwo, though it is quite tricky to pull out the interaction LOD scores relative to a specific position. See the following code, for interactions with the locus at 18 cM on chromosome 15 in the hyper data.

    > data(hyper)
    > hyper <- calc.genoprob(hyper, step=2.5, err=0.001)
    > out2 <- scantwo(hyper, incl.markers=TRUE)

    > thepos <- which(out2$map$chr==15 & out2$map$map==18)
    > add <- c(out2$lod[1:thepos,thepos],
    > out2$lod[thepos,-(1:thepos)])
    > full <- c(out2$lod[thepos,1:thepos],
    > out2$lod[-(1:thepos),thepos])

    > results <- cbind(out2$map[,1:2], lod=full-add)
    > class(results) <- c("scanone", "data.frame")

    > plot(results)

    We use scantwo to perform the genome scan; incl.markers=TRUE is used to ensure that calculations are also done at the genetic markers and not just on an evenly-spaced grid. In the results, out2$lod is a matrix of LOD scores and out2$map contains information on the positions at which the LOD scores. We determine the index for the position at 18 cM on chromosome 15, and then pull out the relevant LOD scores. We then create an object of the form produced by scanone, but containing the interaction LOD scores.

  23. Can one apply scantwo restricted to an interval?

    No, but one may use scanqtl to perform a two-dimensional, two-QTL scan in a given interval.

  24. How can multiple crosses be combined?

    One may use the function c.cross to combine multiple backcrosses and/or intercrosses, provided that they have the same genetic maps. This should be done after running calc.genoprob or sim.geno The combined analysis of multiple crosses requires care and is beyond the scope of this book.

  25. Can one apply the "False discovery rate" (FDR) idea to QTL mapping with R/qtl?

    In the context of a single phenotype, one cannot fruitfully apply the false discovery rate idea to QTL mapping. If one views as the set of null hypotheses that individual loci are not linked to any QTL, one really has just one null hypothesis per chromosome, and so a total of 20 null hypotheses for the mouse genome.

  26. Can R/qtl be used to perform association mapping (aka in silico mapping)?

    No.

  27. Can one use physical locations of markers in place of a genetic map?

    The results of QTL analysis depend critically on the order of the genetic markers, and so knowledge of the physical locations of markers will be useful. However, calculations of conditional QTL genotype probabilities, given the available marker data, must rely on estimates of the recombination fractions between markers, which may only be obtained from a genetic map. Physical distances between markers are not a good substitute for genetic distances.

  28. What map function should I use?

    In general, one should use a map function that best reflects the level of crossover interference. However, QTL mapping calculations still generally rely on an assumption of no crossover interference; a map function is used only to convert genetic distances into recombination fractions.

    The choice of map function seldom has much effect on the QTL mapping results, particularly in the case that the genetic markers are relatively dense and the genotype data are relatively complete. If one uses, for the analysis, a genetic map that was estimated from the same data, we recommend use of the same map function for both the estimation of the genetic map and the QTL mapping analysis; the choice of map function will have little impact on the results.

  29. Are QTL mapping results much affected by segregation distortion?

    QTL analyses are generally conditional on the observed marker genotype data, and so results are little affected by the presence of segregation distoortion. The reconstruction of genotypes at putative QTL relies on an assumption of no segregation distortion, but with reasonably dense markers and reasonably complete genotype data, this will not be a concern. Segregation distortion may result in reduced power to identify QTL, but it should not lead to spurious evidence for QTL. And so, while one should investigate the possibility of segregation distortion (for example, with geno.table), as it may indicate genotyping problems, one need not be concerned about the influence of true segregation distortion on the QTL mapping results.

  30. The organism I'm studying doesn't have a linkage map. Can I construct one from scratch with R/qtl?

    The de novo construction of a genetic map is not yet available in R/qtl. However, one may import data as if all markers are on one chromosome and use the estimated recombination fractions and LOD scores for all pairs of markers (calculated via est.rf to partition markers into linkage groups. Then import the data with markers assigned to the inferred linkage groups, and use ripple to establish the order of markers within each linkage group.

  31. Are there courses or workshops on R/qtl?

    The Jackson Laboratory, in Bar Harbor, Maine, has held a short course on complex trait analysis (generally in September or October) that has included a tutorial on R/qtl. See http://www.jax.org/courses.

    The Advanced QTL Mapping module in the Summer Institute in Statistical Genetics (formerly held at North Carolina State University; now at the University of Washington, Seattle, in June) has included a tutorial on R/qtl. See http://www.biostat.washington.edu/sisg.


[ Home | Download | Readme | Status | Bugs | Sample graphics | Sample data | Tutorials | Help pages: ( html | pdf ) | Citation ]


Last modified: Fri Sep 21 22:59:18 CDT 2007