GenomeSim program is used for simulating whole genome wide data (chromosomes 1-22) 1-10000 times and analyze simulated data with ANALYZE package or GeneHunter.
Purpose of genome-wide simulation is to calculate empirical p-value for given LOD Score from original analysis. If in original scan highest LOD Score was 2.9 and in 1000 replicate genome-wide simulation same or higer LOD Score was seen 6 times, empirical p-value is 6/1000 = 0.006.
Command is:
genomesim.sh [OPTIONS]...
Options are:
******************************************************************************
Genome-wide simulation software. Version 0.1 (C) Tero Hiekkalinna 11.6.2004
Usage: genomesim.sh [PROGRAM OPTION] [replicate number]
Program options:
-a simulation using ANALYZE (2-point)
-gh simulation using GeneHunter (singe and multipoint)
Replicate number means how many times each chromosome will be simulated,
number can be between 1 and 10000
Example: genomesim.sh -a 10000
******************************************************************************
It is useful test input files and simulation example with one replicate before full simulation with 10000 replicates:
genomesim.sh -a 1
It is mantatory to run simulation scan on background using bgrun program, example:
bgrun genomesim.sh -gh 1000
GenomeSim.sh uses HALDANE map function to convert cM's to recombination fraction.
If one desires to cancel simulation, create file called stop.sim to folder where analysis were started. Analysis will be stopped before new simulation, but current replicate will finnish first.
Pre-makeped formatted LINKAGE files (chr1.raw, chr2.raw, ... , chr22.raw) with one disease locus (first locus). MEGA2 formatted map files for each chromosome named as map1.dat, map2.dat, ... , map22.dat. And model file which describes inheritance model which will be used in parametric analysis.
Model file is identical to AUTOSCAN model file with few exceptions. Autoscan website: http://www.helsinki.fi/~tsjuntun/autoscan/README.use. Line which includes number of liability classes in pedigree file must also have identifier liab and last line of the file must be always 0.
Example of model file of one liability class:
1 0.1 1 liab 0.9 0.9 0.0 30 0
Example of model file of four liability classes:
1 0.1 4 liab 0.9 0.9 0.0 0.8 0.8 0.0 0.5 0.5 0.0 0.1 0.1 0.0 30 0
Following options from ANALYZE package are used:
ANALYZE output files database.out and summary.out files for each chromosome will be compressed and 'tarred' and to subfolder all_results with naming scheme:
Each chromosome tar file will be named as:
GENEHUNTER singlepoint and multipoint output files for each chromosome will be compressed and 'tarred' and to subfolder all_results with naming scheme:
Each chromosome tar file will be named as:
After simulation is completed. Use gsempirical.sh for calculating p-value.
Command is:
gsempirical.sh [OPTIONS]...
Options are:
################################################################################
# Program : GSEmpirical.sh - Calculate empirical p-value from GenomeSim.sh #
# simulation results #
# Date : 23.6.2004 #
# Author : Tero Hiekkalinna #
# Email : Tero Hiekkalinna@Helsinki.FI (tero hiekkalinna@ktl.fi) #
# #
# COPYRIGHT (c) Tero Hiekkalinna 2003-2004 #
################################################################################
################################################################################
PROGRAM=0 (1=analyze, 2=gh)
THRESHOLD_LOD=3
################################################################################
Usage: gsempirical.sh [options]
Options:
--prog [program] Specify program (analyze,gh)
--threshold-lod [number] Specify threshold LOD Score
--replicates [number] Specify number of replicates (default is 100)
--help or -h This help
Example of calculating empirical p-value for ANALYZE results, LOD Score
threshold 3.0
gsempirical.sh --prog analyze --threshold-lod 3.0
Run gsempirical.sh in directory where you have all your *.tar.gz simulation output files. Threshold LOD score is the maximum lod score of your original scan for which you want to calculate p-value. Output file is gsempirical.out. Specify --replicates only if you have less or more than 100 replicates.
GENOMESIM:
Unpublished work, will be published by 2004.
SIMULATE:
Terwilliger JD, Speer M, Ott J (1993) Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol 10, 217-224
Terwilliger JD, Ott J (1994) Handbook of Human Genetic Linkage. Johns Hopkins University Press, Baltimore
Other software:
See also ANALYZE and Genehunter documentation for references.
Tero Hiekkalinna