################################################################################
date()
# Below R cmd clears out your prior workspace.
rm(list = ls(all = TRUE))
################################################################################

# R/qtl Starting Note: R/qtl works only F2, BC, & RIL mapping populations. 
# For other population types, you will need to use Carthagene software. 

# NOTE: IF USING NOTEPAD FOR THIS CMD BATCH FILE, TURN OFF THE WORD WRAP OPTION. 

# IF YOU HAVE RIL DATA, you will need to read in your RIL data as if it were 
# backcross data (see below read.cross command specifics), and then convert 
# the read-in "BC" data to RIL data (using the below command convert2riself). 

# BEFORE USING RIL DATA WITH R/qtl, you MUST covert AB (or H) genotype codes 
# to NA (or N) codes or dashes (-), before using the below read.cross command. 
# Nearly all linkage mapping software (including the old Mapmaker software and 
# R/qtl) cannot handle heterozygote genotypes for the construction of an RIL- 
# genetic linkage map, so make this coding change before proceeding further. 
# Also keep in mind that making SNP heterozygote H calls with the GenomeStudio 
# software is more error-prone compared to making SNP homozygote A or B calls! 

# Note that in R/qtl, upper and lower case text are not treated the same, so it 
# is best to stick with lower case (but check the R/qtl manual for cmd info). 

################################################################################

# NOTE: You can EDIT COPY lines from this batch file EDIT PASTE those lines into 
# the R/qtl console. However, you need to check the line sections before doing 
# so to ensure that the cmds you want executed apply to your specific situation. 

# NOTE: R/qtl treats input text that has a starting pound(#)-sign as a comment. 
# You can also #-sign cmds you do not to run now but want to keep for later use. 

# Do an edit-find below to find batch file cmds you should review before using! 
# Edit find #? to search for pre-run items of interest to you as a user! 
# Edit find #+ to search for batch file lines to check on drive directory info! 
# Edit find #= to search for cmds that differ between F2 & RIL populations! 
# Edit find #$ to search for items of interest on marker seg Chi-square tests! 
# Edit find #| to find sections that deal with marker allele switch correction! 
# Edit find #\ to find sections that deal with genotyping error correction! 
# Edit find #> for sections needing settings on paired ind genotypic identity!
# Edit find #< for sections needing settings on chr nos & chr/qtl map pos & Qn!
# Edit find #& for sections needing the Chromosome Number changed (post-ID QTL)!

################################################################################

# Start R by double-clicking on the blue cap R icon on your computer's desktop. 

# In the Broman & Sen book that describes the use of R/qtl (a good book to get), 
# the authors advise users (see appendix A-3) to TURN OFF the R Windows Option 
# called "buffered output". Click on the R menu bar item Misc to deselect the 
# buffered output option at the start of each R/qtl session. 

################# DO THE ABOVE NOW BEFORE PROCEEDING FURTHER! ##################

#? You should try to use the most recent version of R if at all possible! 
# To determine the version of R that you are using, use the next command. 

version

# Within R, use the below function to determine the current R working directory. 

getwd()

#? In most cases, this will usually be "My Documents" or just plain "Documents". 
# On Jimbo's 303 KH Lab Computer, default is "drive:/Users/haasoyuser/Documents". 

# You should use a different working directory for each R/qtl project. I use a 
# drive:/Workxxxx directory, where xxxx is code for intuitive recall of pop-no. 
# To ensure your .Rdata file will be placed into your Workxxx directory, use 
# the setwd cmd (assuming you first created a desired work directory Workxxxx). 

#+ BE SURE TO FIRST CREATE YOUR WORK DIRECTORY BELOW - CHECK TO SEE IF DRIVE  
# NAME & FOLDER NAME ARE CORRECT! THE *.CSV DATA FILE MUST BE IN THAT FOLDER! 

setwd("C:/WorkAllP")

# You also can go to the R menu bar, click on File, then on Change Directory, 
# but you would then have to do this each time you run R (which is a hassle). 

# If you changed the working directory, and/or you simply want to know what the 
# working directory is right now, then use the below cmd to determine that. 

getwd()

################################################################################

# There is a printed R/qtl manual in the 303KH lab on the shelf above computer. 
# A PDF version of the manual is on the desktop screen of 303KH lab computer. 

#? You will want to download the pdf version to your laptop desktop! 

# If you want quick access to both R and R/qtl command help in an HTML format, 
# the next command will place that HTML help on your desktop monitor. However, 
# this can be a nuisance, so #-sign it out if you don't want it. I did so here. 

#help.start()

# After the HTML opens, click on base to get R help, or click on packages, then 
# on qtl to help on R/qtl commands. However, the R/qtl pdf manual is much more 
# detailed, so it is best to download the pdf manual to your desktop screen. On 
# the other hand, R help can be useful if you are unfamiliar with R commands. 

################################################################################

# The next command prints the date and time. Its periodic use in a batch file 
# like this one allows you to assess the time needed to execute R/qtl commands. 

date()

# The next command loads the R/qtl package into R (required each new R session). 

library(qtl)

# Next command reveals the current version of R/qtl you have on your computer. 
#? Visit the R/qtl web site to see if there has been a newer version released. 

qtlversion()

# In this and other batch files, I have used some useful commands from other R 
# packages for data manipulation, graphical layout, or specific statistics, etc. 
# The below R packages are auto-loaded here with the next set of commands. 
#? NOTE: You MUST have downloaded these packages before running this batch file, 
# and you did, IF you followed my R/qtl download instructions (other text file). 

library("MASS")
library("calibrate")
library("moments")
library("psych")
library("qtlDesign")
library("stats")

# By default, R/qtl uses up to 1 Gb computer memory. Use next cmd to check this. 

memory.limit()

# On some laptops/desktops, computer memory available for R can be increased to 
# 2Gb or to 3Gb, using below cmds (if such memory is installed on the computer). 
# Note that with the 32-bit version of R for Windows, R can ONLY USE UP TO 3Gb 
# of RAM, regardless of how much RAM you may have installed on your computer. 
#? Use one of the next cmds to let R use an amount available on your computer. 

#memory.limit(2048)
#memory.limit(3583)
#memory.limit(4027)

# HOWEVER, with the 64-bit version of R for Windows, R can use all installed 
# RAM on your computer that is available at the time (e.g., 4096, 6144, etc.). 
# If you have the 64-bit version of R for Windows on your laptop/desktop, 
# you should then #-sign out all of the above memory.limit cmds. 

#? Periodic issuing of the save.image() command will preserve your work in case 
# the R program crashes. However, doing so will also slow down your work 
# (i.e., saving your work takes time and the more work you have to save .....). 
# Note that save.image() is just a short-cut for save my current workspace, 
# and is equivalent to the command  save(list = ls(all=TRUE), file = ".RData"). 
# It is also what happens when you end your R session with q("yes"). 

#? I typically don't use the save.image cmd, but instead capture the text output 
# in the R console (periodically) by clicking on the R menu select all button, 
# then the copy button, then edit-pasting the copied R text into a notepad file 
# to more easily review that text later. Inversely, you can edit-copy R/qtl 
# commands from some source text file (e.g., a cmd batch file like this one, or 
# from the R/qtl pdf manual, or from whatever), and then paste the text into 
# the R console box by using the R menu paste (clipboard image) button. 

################################################################################

#? The par command sets up some desirable R graphics margin and font parameters 
# that Jimbo likes for the graphics part of his R console. Some graphs in this 
# R/qtl batch file require unique parameters, so a particular par cmd may be put 
# in front of a plot cmd. Just revert back to the below parameters when done. 
# Or do a par(oldpar) cmd below at any time to revert to the startup parameters. 

par(mar=c(5.1,5.1,5.1,3.1),mfrow=c(1,1),las=0,cex=1,cex.axis=1,cex.lab=1,cex.main=1,cex.sub=1,font=1,font.main=2,font.sub=2,font.axis=2,font.lab=2)
oldpar <- par(mar=c(5.1,5.1,5.1,3.1),mfrow=c(1,1),las=0,cex=1,cex.axis=1,cex.lab=1,cex.main=1,cex.sub=1,font=2,font.main=2,font.sub=2,font.axis=2,font.lab=2)

# By the way, when you see the R-console graphic element pop up to the right of  
# the text R-console, immediately go to the R menu bar to click on History, then  
# Recording, to invoke a slick Windows means of retaining all graphics produced 
# in a given R/qtl run. You can page up/down through these graphics at any time! 

# THE R-CONSOLE WINDOW JUST OPENED, SO STOP YOUR EDIT-COPYING HERE AND CLICK ON 
# R WINDOW, THEN CLICK ON THE R MENU BAR HISTORY BUTTON, THEN RECORDING BUTTON! 

#-$$$$$$$$$$$$$$$$$$$$$$$$ DO THE ABOVE CLICKING NOW! $$$$$$$$$$$$$$$$$$$$$$$$-#

################################################################################
################################################################################

# If you are working with a species that does not have a genomic sequence or a 
# well-documented linkage map of several thousands of markers, you would have 
# to follow the directions that Karl Broman outlines in his Tutorial entitled 
# "Genetic map construction with R/qtl". A pdf of that tutorial is available 
# at the # R/qtl web site:  http://www.rqtl.org  (click on the tutorials tab). 

#? We in soybeans have a genomic sequence (Williams 82) and a well-documented 
# Version 4.0 genetic linkage map (Hyten et al. 2010). We also have available 
# a 1536-SNP chip that can be used to genotype soybean mapping populations. 
# One can thus genotype 96 individuals at a time with this chip, so you could 
# assay multiple sets of 92 F2s or RILs, 2 parents (A & B), 1 F1 (AB), and 1 
# other line (perhaps the genome sequenced Williams 82, if desired). 

#? From this point on in this batch file, it is assumed that you have arranged 
# the SNP (and/or other) markers in your *.csv file in Vers. 4.0 marker order. 

# My below cmd examples are for a selectively genotyped F2 mapping population 
# in which all F2 plants were phenotyped for F2.3 seed pro/oil content, BUT 
# only the F2 plant progenitors of the F2.3 progenies occupying the HI & LO 
# decile "tails" of F2.3 phenotypic seed protein distribution were genotyped. 
# (i.e., F2 progenitors of the 22 HI Pro & 22 LO Pro F3 seed progenies. 

################################################################################
################################################################################

# NOTE: THE NEXT CMDS WILL INPUT YOUR EXCEL.CSV FORMATTED DATA FILES INTO R/QTL. 

#? Note that in R/qtl, upper and lower case text are not treated the same, so it 
# is best to stick with lower case (but review cmds in R/qtl manual for sure). 

#? NOTE: Please take a look at the R/qtl home page:  http://www.rqtl.org/ 
# There you can click on sample data sets to see the *.csv file formats that can 
# be used to input data into R/qtl using the read.cross command shown below. 
# Genotypic and Phenotypic data can be in the same (or separate) *.csv file(s), 
# with marker names/genotypes / F2 names/phenotypes arrayed in either a CSV 
# format (marker names across top) or a CSVR format (marker names down). 

#? Use the below read.cross command to input (into R/qtl) an Excel.csv formatted 
# file(s) containing combined (or separate) phenotypic and genotypic data sets. 
# See Chapter 2 in the Broman and Sen book, or the R/qtl web site for guidance. 

#? If you entered a chr number & map position for each marker in your *.csv file, 
# identify the map function (Haldane or Kosambi) in the read.cross cmd. You can 
# specify the function even if you did not enter the Chr# & cM map positions. 

#? See page 176 of the R/qtl manual (Nov 2011 edition) for guidance on using the 
# read.cross cmd. Note that the na.strings option specifies the codes NOT to be 
# read in as genotypes (i.e., these will be treated by R/qtl as missing data). 

# Choose an intuitively unique but cryptic text name for your R/qtl cross object 
# name. The setup of this batch file is such that you can use it for another 
# *.csv data file by using edit-find-replace that short text name for another. 

#+ BE SURE TO AGAIN CHECK YOUR WORK DIRECTORY - ARE DRIVE & FOLDER NAME CORRECT? 
# You set the work directory above with this COMMAND BATCHFILE, but check again! 

getwd()

#+ BELOW CMD IS USED WITH A SINGLE F2 POPULATION - DID YOU CHECK DIR & *.CSV NAMES? 
#= (NOTE: You will need to modify this section for RIL pop data input! See Jimbo).

OBJNAME <- read.cross(format=c("csvr"), dir="C:/WorkAllP", file="P1022GenPhe.csv", na.strings=c("-","NA"), genotypes=c("A","H","B"), alleles=c("A","B"), error.prob=0.001, map.function=c("kosambi"))

# Ignore the warning msgs about "Some markers at the same position". This occurs
# if tightly linked markers in your population exhibited no recombination. 

# Check output of below summary cmd to see if *.csv data was correctly inputted. 

summary(OBJNAME)

# The below jittermap cmd is used to slightly offset marker map positions by 
# 0.000001 units, which slightly separates markers that have identical map 
# positions in this population. Markers that are very closely linked may have 
# identical map positions because of the lack of recombination between such 
# markers in the given population resulted in a haplotype condition. This, of 
# course, assumes that the markers are not inadvertent human error duplicates. 

jittermap(OBJNAME, amount=1e-6)

# NOTE NOTE NOTE !!!
# If above read.cross cmd did not work, check spelling in your *.csv file name. 
# Also, remember that R/qtl is text.case sensitive so check that issue too.
# Otherwise, review your *.csv file - it may have "glitchy" typo errors in it. 

# First "phenotype" in the summary output is actually F2 ID # (the word ID
# was put into the top left cell of the *.csv file). This phenotypic "ID" is 
# used in getid() cmd to identify F2s (ID#s need not be consecutive numbers). 
# NOTE: R/qtl assigns a (different) consecutive index number for markers and 
# individuals as a means of tracking the number of each in the current object. 

# If above read.cross cmd did not work, check spelling in your *.csv file name. 
# Also, remember that R/qtl is text.case sensitive so check that issue too.
# Otherwise, review your *.csv file - it may have "glitchy" typo errors in it. 

################################################################################
################################################################################

# NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - #

# NOTE: TITLES in all of the below graphs are population-specific, but you can 
# change the TITLES in those graphs using the edit-find / edit-replace cmd. 

# You MUST use the global edit-find edit-replace cmd to change from the title 
# words currently present in this batch file to the title words you want. 

# When using the edit-find edit-replace cmd, select the "match case" option. 
# You will need to execute the global cmd for EACH of the paired title words. 

# BELOW POP NAMES WERE USED IN THE GRAPH TITLES IN THIS BATCH FILE: 
#  Prefix of C denotes a combined set of F2 populations with identical parents.
 
# PI 153.297 X Jim
# P1022

# Below is a list of specific title words for the 48 F2 SG populations: 

# F2 Combined - C0008 - C0008 X OAC Vision
# F2 Population P1001 - PI 153.296 X OAC Vision
# F2 Population P1002 - PI 189.963 X OAC Vision
# F2 Population P1003 - PI 548.399 X OAC Vision
# F2 Population P1004 - PI 372.423 X OAC Vision
# F2 Population P1005 - FC 30.687 X OAC Vision
# F2 Population P1006 - PI 153.293 X OAC Vision
# F2 Population P1007 - PI 372.412 X OAC Vision
# F2 Population P1009 - PI 548.414 X OAC Vision

# F2 Combined - Cm005 - Cm005 X Jim
# F2 Population P1022 - PI 153.302 X Jim
# F2 Population P1023 - PI 159.764 X Jim
# F2 Population P1024 - PI 438.415 X Jim
# F2 Population P1025 - PI 153.301 X Jim
# F2 Population P1026 - PI 189.880 X Jim
# F2 Population P1027 - PI 153.297 X Jim
# F2 Population P2211 - HHP X Jim
# F2 Population P2212 - AC Proteus X Jim
# F2 Population P2213 - AC Proteina X Jim

# F2 Combined - Cmm04 - Cmm04 X MN 0301
# F2 Population P1039 - PI 427.138 X MN 0301
# F2 Population P1040 - PI 261.469 X MN 0301
# F2 Population P1041 - PI 181.571 X MN 0301
# F2 Population P1042 - PI 424.148 X MN 0301
# F2 Population P1043 - PI 423.954 X MN 0301
# F2 Population P1044 - PI 154.196 X MN 0301

# F2 Population P1054 - PI 437.088A X MN 1301
# F2 Population P1055 - PI 423.949 X MN 1301
# F2 Population P1056 - PI 427.141 X MN 1301
# F2 Population P1057 - PI 437.716A X MN 1301
# F2 Population P1058 - PI 423.942 X MN 1301

# F2 Population P1075 - PI 423.948A X Dwight
# F2 Population P1076 - PI 437.112A X Dwight
# F2 Population P1098 - PI 548.608 X Dwight

# F2 Population P1107 - PI 445.845 X Pana
# F2 Population P1108 - PI 398.516 X Pana
# F2 Population P1109 - PI 91725-4 X Pana
# F2 Population P1110 - PI 340.011 X Pana
# F2 Population P1111 - PI 243.532 X Pana
# F2 Population P1113 - PI 408.138C X Pana
# F2 Population P1121 - PI 398.672 X Pana
# F2 Population P1122 - PI 360.843 X Pana

# F2 Population P1138 - PI 253.666 X Rend
# F2 Population P1139 - PI 407.788A X Rend
# F2 Population P1140 - PI 424.286 X Rend
# F2 Population P1142 - PI 407.877B X Rend
# F2 Population P1143 - Rend X PI 398.704 (This is indeed a reciprocal mating)
# F2 Population P1145 - PI 398.970 X Rend
# F2 Population P1146 - PI 407.823 X Rend
# F2 Population P1152 - PI 407.773B X Rend
# F2 Population P1183 - PI 458.256 X Rend

# NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - NOTE - #

################################################################################
################################################################################

# LET'S START WITH A PRELIMINARY EXAMINATION OF THE INPUTTED PHENOTYPIC DATA. 

date()

# Next cmd plots (with black pixelation) missing genotypes in the inputted data. 

plot.missing(OBJNAME, reorder=FALSE, main="Missing Genotypes", alternate.chrid=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Individual Index Number versus Marker Index Number", cex=0.65, line=2.9)
mtext(side=3, "Chromosome Number", cex=0.65, line=1.3)

# Next cmd generates a color plot of the A:H:B - genotype pattern in your data. 
#= NOTE: For F2 data, the graph title colors will be A=red  H=blue  B=green. 
#= NOTE: For RIL A:B data (no H allowed in *.csv), chg title to A=red  B=blue. 
# The below graph depicts the patterns of missing and present genotypes for the 
# markers and individuals in this population. 

geno.image(OBJNAME, reorder=FALSE, main="Marker Genotypes: A=red  H=blue  B=green  Missing=white", alternate.chrid=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Individual Index Number versus Marker Index Number", cex=0.65, line=2.9)
mtext(side=3, "Chromosome Number", cex=0.65, line=1.3)

################################################################################

# LET'S TAKE AN ALTERNATIVE LOOK AT THE COMPLETENESS OF THE GENOTYPIC DATA. 

date()

# The next cmds generate two plots on one page showing the completeness of 
# genotyping in this data set, first by SNPs within each F2 (left), then by 
# F2s within each SNP (right). 

par(mfrow=c(1,2), las=1)
plot(ntyped(OBJNAME), ylab="Number of Typed SNPs in Each Indexed F2", main="Left>Right F2 Index #" )
mtext(side=3, "P1022 X Jim", line=3.6)
plot(ntyped(OBJNAME, "mar"), ylab="Number of Typed F2s for Each Indexed SNP", main="Left>Right SNP Index #")
mtext(side=3, "P1022 X Jim", line=3.6)

# F2s or RILs with many missing genotypes can be omitted with a subset cmd which
# will RETAIN only those F2s or RILs that have specified min no. of genotypes.
# The following cmd set will do that.

# NOTE: Because of Selective Genotyping, Jimbo #-signed out the below set of cmds. 
# FOR SELECTIVE GENOTYPING, set the below threshold low enough to RETAIN ALL F2s. 

#nt.byind <- ntyped(OBJNAME, what=c("ind"))

#print(nt.byind[nt.byind <1])
#match(names(nt.byind[nt.byind <1]), getid(OBJNAME))

#OBJNAME <- subset(OBJNAME, ind=(ntyped(OBJNAME)<1))

#summary(OBJNAME)
#jittermap(OBJNAME, amount=1e-6)

#date()

# Markers with too many missing genotypes can be dropped with the drop.markers 
# cmd, which will remove any that have less than a specified no. of genotypes.
# The following cmd set will do that (however, Jimbo #-signed cmds out-of-play).

#nt.bymar <- ntyped(OBJNAME, what=c("mar"))
#print(nt.bymar[nt.bymar <1])
#find.markerindex(OBJNAME, names(nt.bymar[nt.bymar <1]))

# FOR NOW, Jimbo set the below threshold low enough (<1) to RETAIN ALL SNPs. 

#martodrop <- names(nt.bymar[nt.bymar <1])
#totmar(OBJNAME)
#OBJNAME <- drop.markers(OBJNAME, martodrop)
#totmar(OBJNAME)

#date()

#summary(OBJNAME)
#jittermap(OBJNAME, amount=1e-6)

# Need to reset the graphics parameters back to their original values. 

par(mfrow=c(1,1), las=0)

################################################################################

# THE BELOW CMDS SETUP AND THEN CREATE AN INITIAL GENETIC MAP OF INPUTTED DATA. 

date()

# NOTE: Your *.csv file must contain chromosome numbers and marker map positions 
# that fit within the framework of the Version 4.0 soybean linkage map published 
# by Hyten et al. (2010). See the supplemental tables in that publication. 

# Let's graph your markers based on the Hyten et al. (2010) Ver. 4.0 map for the 
# Chr and Marker positions you inputted into R/qtl with your above *.csv file. 
# Note that the below "setup" information is specific (Version 4.0 Genetic Map)! 
#*******************************************************************************
# Below are setup x,y,z variables for chr # (x) and start (y) & end (z) lengths, 
# where z = Hyten et al. Chromosome 1-20 terminal lengths (in Kosambi cM). 
x <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
y <- c(0,0,0,0,0,0,0,0,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
z <- c(98.41,140.63,99.51,112.32,86.75,136.51,135.15,146.67,99.60,132.89,124.24,120.50,120.03,108.18,99.88,92.27,119.19,107.09,101.14,112.77)
# Next variable is used for text length labeling at Hyten et al. chromosome ends. 
# NOTE: The quotes were used to put a space in front of shorter 2-digit lengths. 
chrlenglab <- c(" 98",141,100,112," 87",137,135,147,100,133,124,121,120,108,100," 92",119,107,101,113)
# Next variable centers the 3-digit length value at the bottom of the chr ends). 
x1 <- c(0.5,1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5,11.5,12.5,13.5,14.5,15.5,16.5,17.5,18.5,19.5)
# Jimbo added 6 digits to each z var to print chr leng values below the chr ends. 
z1 <- c(104.41,146.63,105.51,118.32,92.75,142.51,141.15,152.67,105.60,138.89,130.24,126.50,126.03,114.18,105.88,98.27,125.19,113.09,107.14,118.77)
#*******************************************************************************

# Lets now use the next cmd for a "summarized" overview of the inputted data. 

# NOTE: The jittermap cmd separates (near- or completely) coincident marker 
# positions ever so slightly to avoid later QTL analytic regression failures. 

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# NOTE: The Hyten et al. (2010) (not F2) SNP order/distance data is used here! 

# Next cmds plot/print data set's SNP marker Kosambi map positions for each chr. 

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, alternate.chrid=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[ALL Inputted Markers - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPS, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
# NOTE: Jimbo #-signed out below cmd - don't need the map at this point. 
#pull.map(OBJNAME, as.table=FALSE)

################################################################################

# LET'S FIRST CHECK FOR MARKER ALLELE SEGREGATION DISTORTION. 

date()

# Lets take a look at the segregation of each of the markers in this data set. 
# NOTE: Chi-square test of observed data fit to a 1A:2H:1B in F2; 1A:1B in RIL. 

# The outputted segregation data will be ordered by Chromosome # & Marker name. 

gt <- geno.table(OBJNAME, scanone.output=FALSE)
gt

# An error message will display for a marker with too few genotypes for a 
# conventional Chi-square test - a Yates correction is required (not done here). 

# Let's sort the above outputted segregation data by X2 P value (worst to best). 

sortgt <- gt[order(gt$P.value), ]
sortgt

# Let's next ID marker loci whose seg distortion is significant at X2 P>0.01. 
# Keep in mind, however, that with hundreds of markers, you should really make 
# a Bonferroni correction of a single-test X2 Prob to a multi-test X2 Prob. 

#$ Let's look at the output with an X2 P criterion of P<0.01 (or your choice). 

gt[ gt$P.value < 0.01, ]

# Lets sort the above output.

gt01 <- gt[ gt$P.value < 0.01, ]
sortgt01 <- gt01[order(gt01$P.value), ]
sortgt01

# The above identified bad markers could be dropped, but that may be too severe.

#$ The below cmd adjusts a SINGLE COMPARISONWISE X2 test P=0.05 to a MULTIPLE 
# EXPERIMENTWISE X2 test P=0.05 by simply dividing 0.05 by the number of markers
# in your data set. There is a precise formula for making the adjustment (known
# as Bonferroni P correction), but the below short-cut division method provides
# an experimentwise alpha that has sufficient precision for our purpose here of
# removing the "baddest" markers of those with significant genotype distortion. 

# Below is the experimentwise P value calculated for this set of markers:

0.05/(totmar(OBJNAME))

#$ Now, let's identify markers with this multi-test adjusted Chi-square P value. 

gt[ gt$P.value < 0.05/(totmar(OBJNAME)), ]

gtBon <- gt[ gt$P.value < 0.05/(totmar(OBJNAME)), ]
sortgtBon <- gtBon[order(gtBon$P.value), ]
sortgtBon

# Let's get a "suspect" marker name list (w/seg distortion at adjusted X2 Prob). 

suspect.markers <- rownames(sortgt[ sortgt$P.value < 0.05/(totmar(OBJNAME)), ])
suspect.markers

#$ Next cmds remove suspect markers exceeding a X2 P < multi-test X2 P criterion, 
# and show marker numbers in data set before & after dropping the "bad" markers. 

totmar(OBJNAME)
OBJNAME <- drop.markers(OBJNAME, suspect.markers)
totmar(OBJNAME)

# Note the reduction in markers in the cross object after the drop.markers cmd. 
# It is always a good idea to review cross object details after marker drops. 

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# As you can see, the markers dropped in the above cmds are now gone. 

# Lets plot the genetic map again after dropping the seg-distorted markers. 

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, alternate.chrid=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[Seg-Distorted Markers GONE - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPS, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
# NOTE: Jimbo #-signed out below cmd - don't need the map at this point. 
#pull.map(OBJNAME, as.table=FALSE)

################################################################################

# WHAT ABOUT GENOTYPE A:H:B (or A:B) FREQUENCY VARIATION ON A F2 (or RIL) BASIS?  

date()

#+ Just as we expect the markers to segregate 1A:2H:1B in the F2 Pop, we expect  
# F2s to have 1A:2H:1B genotype frequencies in approximately the same proportion.
# The next cmds generate graphs of where the F2s fall in three graphs - the left
# graph has a vertical axis of A genotypic frequency, the middle graph has a 
# vertical axis of H genotypic frequency, and finally the right graph has a 
# a vertical axis of B genotypic frequency. (NOTE: See Broman's tutorial for
#= an F2 pop, where he shows three graphs for A:H:B. For an RIL, only two occur)
#= (NOTE: For RIL pops, change 1:3 to 1:2 (3 places) and remove ,"AB", in main).

g <- pull.geno(OBJNAME)
gfreq <- apply(g, 1, function(a) table(factor(a, levels=1:3)))
gfreq <- t(t(gfreq) / colSums(gfreq))
par(mfrow=c(1,3), las=1)
for(i in 1:3) plot(gfreq[i,], ylab="Genotype Frequency", main=c("AA", "AB", "BB")[i], sub=c("P1022", "P1022", "P1022")[i], ylim=c(0,1))

# Need to reset the graphics parameters back to their original values. 

par(mfrow=c(1,1), las=0)

################################################################################

# WHAT ABOUT DUPLICATE MARKERS (i.e., markers with identical chr map positions)? 

date()

# These would be markers that have identical genotypes for every individual, and 
# could arise from human errors committed when inputting marker names / data. 
# Alternatively, depending on the population, spacing of markers in terms of 
# genome positions relative to "hot" and "cold" recombination sites in those 
# physical parts of the genome, and/or lack of parental marker bimorphism in 
# other genomic regions, some adjacent markers may not have experienced a 
# recombination event because there are simply too few individuals in this pop. 
# Such "nonrecombined" markers constitute a haplotype set in the given data set 
# and are not really "duplicate markers" per se, and thus do not necessarily 
# need to be dropped from the data set. However, all but one member of each 
# haplotype marker set can be temporarily removed to speed up the genetic map 
# analysis when using some R/qtl cmds that examine marker-to-marker rf values 
# during the construction and finalization of a genetic map. After finalization 
# marker order/distance, haplotype markers can be added back to the genetic map. 

# Let's use findDupMarkers cmd to get a listing of those markers that are
# duplicate (or triplicate, quadruplicate, or higher order 'plicate), based 
# on identical marker genotype coding for every individual in the data set. 

# There are TWO methods of identifying duplicate markers. 

# If the exact.only=FALSE option is used, then the "observed" genotype of a 
# marker is compared to the "observed" genotype of every other marker, and 
# those marker loci that match in every F2 or RIL are reported (a "match" in the 
# FALSE option case includes "matches" of A vs. - and B vs. - (or vice versa). 

# However, if the exact.only=TRUE option is used, then matching is ABSOLUTE 
# (i.e., genotypes: A to A, B to B, and NA to NA (NOTE: NA was assigned to 
# missing genotypes (i.e., dashes) in the *.csv data set by the read.cross cmd). 

# Let's examine both options here to see the difference (if any) in the outputs. 

print(dupfalse <- findDupMarkers(OBJNAME, exact.only=FALSE, adjacent.only=FALSE))

print(duptrue  <- findDupMarkers(OBJNAME, exact.only=TRUE, adjacent.only=FALSE))

# The two outputs usually differ, but not much - usually fewer matches with TRUE. 

# If duplicate markers are NOT dropped, one needs to use the R/qtl jittermap cmd 
# to ever-so-slightly offset their map positions by a 0.000001 map distance to 
# avoid regression failure in some subsequently used R/qtl QTL analysis cmds. 

#? Jimbo decided to NOT drop TRUE duplicate markers using the below set of cmds,
# because in the combined set of SG populations wherein on only one or the other 
# marker member duplicate (or triplicate, etc.) segregates.  MOREOVER, if two or 
# more F2 populations can be merged, such duplicate markers may be useful. 

#totmar(OBJNAME) 
#dupmar.exact <- findDupMarkers(OBJNAME, exact.only=TRUE, adjacent.only=FALSE) 
#OBJNAME <- drop.markers(OBJNAME, unlist(dupmar.exact))
#totmar(OBJNAME)

# Lets again review cross object details after dropping any duplicate markers. 

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# Again, as you can see, any markers dropped in the above cmds are now gone. 

# Lets plot the genetic map again after dropping any duplicate markers. 

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, alternate.chrid=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[DUP Markers GONE - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPs, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
# NOTE: Jimbo #-signed out below cmd - don't need the map at this point. 
#pull.map(OBJNAME, as.table=FALSE)

################################################################################

# LET'S NOW CHECK FOR UNUSUAL OR ANOMALOUS INDIVIDUALS (F2 PLANTs OR RILs). 

date()

# Let's first check (for outliers in) the crossover number in each individual. 

# Use the below cmds to see the observed number of crossovers in each F2 or RIL. 

plot(countXO(OBJNAME, bychr=FALSE), ylab="Number of Crossovers")
mtext(side=3, "F2 Population P1022 X Jim", line=2.6)
mtext(side=3, "Number of XOs detected in each individual (see index #) of this population", cex=0.8, line=1.0)
abline(h=15, lty="dotted")
abline(h=(seq(0,300,25)), lty="dotted")

# In general, the range of F2 or RIL XO counts will vary by population, but if 
# one or more outlying individuals have much greater XO counts, those individuals 
# may not be true descendants of the parental mating of this population, or 
# possibly one of the parents was not "pure" and if heterogeneous plants of that 
# used to generate the F2 or RIL population  ). 

# Now, let's look at the index number - F2 ID number matches for the population.

getid(OBJNAME)

# Lets first identify F2 ID#s with the HIGHEST XO NUMBERS with the next cmds.
# Ignore any error message that says "Must retain at least one individual."

XO251up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 251))
getid(XO251up)

XO226up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 226))
getid(XO226up)

XO201up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 201))
getid(XO201up)

XO176up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 176))
getid(XO176up)

XO151up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 151))
getid(XO151up)

XO126up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 126))
getid(XO126up)

XO101up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 101))
getid(XO101up)

# Lets next identify F2 ID#s with the LOWEST XO NUMBERS with the next cmds.
# REMEMBER WE MUST NOT DROP non-genotyped F2 individuals with XO=zero values!

XO46dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 46))
getid(XO46dn)

XO41dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 41))
getid(XO41dn)

XO36dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 36))
getid(XO36dn)

XO31dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 31))
getid(XO31dn)

XO26dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 26))
getid(XO26dn)

XO21dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 21))
getid(XO21dn)

XO16dn <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 16))
getid(XO16dn)

# Below cmd outputs all but the non-genotyped individuals in this SG pop.

XO01up <- subset(OBJNAME, ind=(countXO(OBJNAME) >= 1))
getid(XO01up)

# Lets create the variable nxo that contains all F2 ID# and their XO counts.

nxo <- countXO(OBJNAME)
nxo

# The next cmd removes HIGH XO outliers - you need to view the above XO graph
# SINCE YOU WILL NEED TO SET A SUITABLE THRESHOLD NUMBER IN THE BELOW COMMAND!

nind(OBJNAME)
OBJNAME <- subset(OBJNAME, ind=nxo < 176)
nind(OBJNAME)

# The next set of cmds removes LOW XO outliers - you need to view the XO
# graph, and choose suitable threshold number that will exclude LOW XO plants. 

# Remember that with selective genotyping we must keep XO=zero non-SG plants. 
# ALSO: You must RERUN the nxo cmd if you removed individuals with above cmd.

nxo <- countXO(OBJNAME) 

nind(OBJNAME)
OBJNAME <- subset(OBJNAME, ind=nxo < 1 | nxo > 15)
nind(OBJNAME)

# Lets re-plot the XO graph with the F2 plants with outlier XOs now gone.

plot(countXO(OBJNAME, bychr=FALSE), ylab="Number of Crossovers")
mtext(side=3, "F2 Population P1022 X Jim", line=2.6)
mtext(side=3, "Hi/Lo XO outliers (if any) have now been removed from this population", cex=0.8, line=1.0)
abline(h=15, lty="dotted")
abline(h=(seq(0,300,25)), lty="dotted")

# Lets again review cross object details after dropping the above individuals. 

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# Lets plot the genetic map again after dropping the "outlier" XO individuals. 

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, alternate.chrid=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[Hi/Lo XO Individuals GONE - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPs, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
# NOTE: Jimbo #-signed out below cmd - don't need the map at this point. 
#pull.map(OBJNAME, as.table=FALSE)

################################################################################

# ANY DUPLICATE INDIVIDUALS? (i.e., IDENTICAL WITH EACH OTHER AT EVERY MARKER!). 

date()

# Duplicates arise from human sampling ERRORS in the generation advance phases 
# of population development (two seeds advanced from say an F2 plant in an F2 
# population), OR the DNA extraction phase (e.g., one individual inadvertently 
# sampled twice for DNA, at the expense of another individual not so sampled, 
# giving rise to two ID# DNA tubes that duplicate the former individual's DNA). 

#? Duplicate individuals must be removed from the data set (see page 6 of the 
# genetic map construction tutorial authored by Broman at the R/qtl web site). 

# Let us first identify duplicate (and possibly triplicate, quadruplicate, 
# quintuplicate, & higher-plicate) sets of marker-identical individuals. 

# The below comparegeno cmd examines all possible PAIRS of individuals in the 
# population and calculates the degree of the paired marker genotype identity. 

# If you have, for example, 200 individuals, there are 200*199/2=19,900 possible 
# pairings of the 200 individuals. The paired identity values for all 19,900 
# pairs will be displayed in histogram according to their fractional degree of 
# identity from ZERO identity (0.0) to COMPLETE (1.0) identity.  A 0.0 value 
# indicates that the pair had a ZERO match of an A or B (plus H in F2) genotype 
# codes at ALL marker loci so compared, whereas a 1.0 value indicates that the 
# pair had a 100% match of those genotype codes at EACH and EVERY marker locus.

# Lets examine an ALLpairs plot of marker pair fractional genotypic identity.
# The histogram peak value should (theoretically) be 0.375 (F2) or 0.5 (RIL). 
#= NOTE: For RIL pops, replace the seven F2 pop "red" & "red" abline v values 
#= 100 150 200 375 550 600 650 with RIL pop values 275 325 375 500 625 675 725) 
#> ADJUST len resolution parameter to 1001 if you have a very large population. 

ALLpairs <- comparegeno(OBJNAME, what="proportion")
par(mfrow=c(1,1),las=0)
hist(ALLpairs, breaks=seq(0, 1, len=101), xlab="Fractional Degree of Paired Genotypic Match")
rug(ALLpairs)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Total Number of Pairs Displayed here = n(n-1)/2 where n = No of Individuals in Pop", cex=0.65, line=1.3)
abline(v=c(0.0,0.0), lty=1)
abline(v=c(0.100,0.100), lty=2, lwd=2, col="red")
abline(v=c(0.150,0.150), lty=2, lwd=2, col="red")
abline(v=c(0.200,0.200), lty=2, lwd=2, col="red")
abline(v=c(0.375,0.375), lty=1, lwd=2, col="blue")
abline(v=c(0.550,0.550), lty=2, lwd=2, col="red")
abline(v=c(0.600,0.600), lty=2, lwd=2, col="red")
abline(v=c(0.650,0.650), lty=2, lwd=2, col="red")
abline(v=c(0.7,0.7), lty=3)
abline(v=c(0.8,0.8), lty=3)
abline(v=c(0.9,0.9), lty=3)
abline(v=c(1.0,1.0), lty=1)

# The histogram vertical rug lines help clarify the fractional identity outliers, 
# which represent pairs falling BEYOND the expected "distributional tail ends". 
# You will need to decide what constitutes right-side "outliers" in the histogram.

date()

# Below cmds identify those pairs exceeding the below USER-SPECIFIED IDENTITY 
# FRACTION THRESHOLDs, and display an R/qtl index number for members of the pair. 
# NOTE: The R/qtl index # is NOT the same as the inputted F2 or RIL ID#. 

#> USER SHOULD SPECIFY THE BELOW FRACTIONS after viewing the histogram and make
# a judgment call as to what is "BEYOND" the distributional (right) tail end. 

#> Jimbo to choose a suitable threshold value after looking at ALLPairs histogram
# and output from the various thresholds used in each below cmd set.

wh99 <- which(ALLpairs > 0.99, arr=TRUE)
wh99 <- wh99[wh99[,1] < wh99[,2],]
wh99

wh95 <- which(ALLpairs > 0.95, arr=TRUE)
wh95 <- wh95[wh95[,1] < wh95[,2],]
wh95

wh90 <- which(ALLpairs > 0.90, arr=TRUE)
wh90 <- wh90[wh90[,1] < wh90[,2],]
wh90

wh89 <- which(ALLpairs > 0.89, arr=TRUE)
wh89 <- wh89[wh89[,1] < wh89[,2],]
wh89

wh88 <- which(ALLpairs > 0.88, arr=TRUE)
wh88 <- wh88[wh88[,1] < wh88[,2],]
wh88

wh87 <- which(ALLpairs > 0.87, arr=TRUE)
wh87 <- wh87[wh87[,1] < wh87[,2],]
wh87

wh86 <- which(ALLpairs > 0.86, arr=TRUE)
wh86 <- wh86[wh86[,1] < wh86[,2],]
wh86

wh85 <- which(ALLpairs > 0.85, arr=TRUE)
wh85 <- wh85[wh85[,1] < wh85[,2],]
wh85

wh84 <- which(ALLpairs > 0.84, arr=TRUE)
wh84 <- wh84[wh84[,1] < wh84[,2],]
wh84

wh83 <- which(ALLpairs > 0.83, arr=TRUE)
wh83 <- wh83[wh83[,1] < wh83[,2],]
wh83

wh82 <- which(ALLpairs > 0.82, arr=TRUE)
wh82 <- wh82[wh82[,1] < wh82[,2],]
wh82

wh81 <- which(ALLpairs > 0.81, arr=TRUE)
wh81 <- wh81[wh81[,1] < wh81[,2],]
wh81

wh80 <- which(ALLpairs > 0.80, arr=TRUE)
wh80 <- wh80[wh80[,1] < wh80[,2],]
wh80

wh79 <- which(ALLpairs > 0.79, arr=TRUE)
wh79 <- wh79[wh79[,1] < wh79[,2],]
wh79

wh78 <- which(ALLpairs > 0.78, arr=TRUE)
wh78 <- wh78[wh78[,1] < wh78[,2],]
wh78

wh77 <- which(ALLpairs > 0.77, arr=TRUE)
wh77 <- wh77[wh77[,1] < wh77[,2],]
wh77

wh76 <- which(ALLpairs > 0.76, arr=TRUE)
wh76 <- wh76[wh76[,1] < wh76[,2],]
wh76

wh75 <- which(ALLpairs > 0.75, arr=TRUE)
wh75 <- wh75[wh75[,1] < wh75[,2],]
wh75

wh74 <- which(ALLpairs > 0.74, arr=TRUE)
wh74 <- wh74[wh74[,1] < wh74[,2],]
wh74

wh73 <- which(ALLpairs > 0.73, arr=TRUE)
wh73 <- wh73[wh73[,1] < wh73[,2],]
wh73

wh72 <- which(ALLpairs > 0.72, arr=TRUE)
wh72 <- wh72[wh72[,1] < wh72[,2],]
wh72

wh71 <- which(ALLpairs > 0.71, arr=TRUE)
wh71 <- wh71[wh71[,1] < wh71[,2],]
wh71

wh70 <- which(ALLpairs > 0.70, arr=TRUE)
wh70 <- wh70[wh70[,1] < wh70[,2],]
wh70

wh65 <- which(ALLpairs > 0.65, arr=TRUE)
wh65 <- wh65[wh65[,1] < wh65[,2],]
wh65

wh60 <- which(ALLpairs > 0.60, arr=TRUE)
wh60 <- wh60[wh60[,1] < wh60[,2],]
wh60

# NOTE: If there are no duplicate pairs for some thresholds, the above wh cmds 
# display just "row col", which means no pairs exceeded the given threshold. 

# In a high identity pair, you will want to omit the pair member with a greater 
# number of missing genotypes. You can do this with the below cmds to "grab" 
# that individual and move it to the 2nd position in the each tabular row of the 
# above 'wh' object (and do so for each such pair) before creating a new object 
# with the name "todrop". This code was provided to Jimbo by Karl Broman.

#> USER MUST SPECIFY THE BELOW FINAL FRACTION after user views above histogram 
#  & makes a judgment as to what is "BEYOND" the distributional (right) tail end. 
#> Jimbo selected a 0.80 threshold for each of the 48 F2 SG map pops of n~250. 
#= NOTE: For RIL pops, this choice may need to be higher - 0.80 or 0.85.

# The numbers outputted by the below cmds are Rqtl INDEX #s not csv file ID#s!

wh <- which(ALLpairs > 0.80, arr=TRUE)
wh

# Let's retain/save the wh output, so that the wh object can be altered below.

whall <- wh

# Below cmd re-arranges the wh object list of two members of each high identity
# pair by ordering the member INDEX # in each pair so that the lower INDEX # is
# listed first (under row) and the higher INDEX # is listed second (under col).
# Note: If there are NO pairs, no index nos appear row and col headings.     

wh <- wh[wh[,1] < wh[,2],]
wh

# Below cmds output a marker genotype count for each individual, but note that
# this output displays the (not necessarily consecutive) individual ID # above
# the marker genotype counts. The additional cmds will generate the (always 
# consecutive) Rqtl-assigned INDEX #s that match the ID #s.  
# Although we phenotyped ALL individuals, only the decile phenotypic tails were
# selectively genotyped (SG), so the NON-SG ID# ones will have zero geno counts. 

nt <- ntyped(OBJNAME)
nt
nt.byind <- ntyped(OBJNAME, what=c("ind"))
match(names(nt.byind[nt.byind >=0]), getid(OBJNAME))

# Below cmds show the member pair INDEX #s and the matching marker geno counts.
# You will want to drop the least genotyped member of the 2-member pairs.

wh
matrix(nt[wh],ncol=2)

# Below cmd set moves the least genotyped F2 of each pair to the right col, 
# so that this F2 member of the pair can be dropped from the population.
# Note:  Below cmd set uses an IF-ELSE R statment to deal with the 
# problem in some pops of having just one pair in the nrow(wh) output.  
#-------------------------------------------------------------------------------

paircount <- matrix(wh)
paircount
nrow(paircount)

if(nrow(paircount) > 2) 
{
 todrop <- rep(NA, nrow(wh))
 for(i in seq(along=todrop)) 
{
 if(nt[1] <= nt[2]) todrop[i] <- wh[i,1]
 else todrop[i] <- wh[i,2]
}
} else
{
 if(nt[1] <= nt[2]) todrop <- wh[[1]] else todrop <- wh[[2]]
}
todrop

# Note: If todrop contains duplicate INDEX #s, they will be treated as just one.
# If there are no high identity pairs, todrop is empty, so ignore the error msgs.
#-------------------------------------------------------------------------------

# Now let's retain all individuals but those index #s in the todrop object.

nind(OBJNAME)
OBJNAME <- subset(OBJNAME,ind=-(todrop))
nind(OBJNAME)

date()

# Lets examine the ALLpairs plot again after removal of the todrop individuals.
# The histogram peak value should (theoretically) be 0.375 (F2) or 0.5 (RIL). 
#= NOTE: For RIL pops, replace the seven F2 pop "red" & "red" abline v values 
#= 100 150 200 375 550 600 650 with RIL pop values 275 325 375 500 625 675 725) 
#> ADJUST len resolution parameter to 1001 if you have a very large population. 

ALLpairs <- comparegeno(OBJNAME, what="proportion")
par(mfrow=c(1,1),las=0)
hist(ALLpairs, breaks=seq(0, 1, len=101), xlab="Fractional Degree of Paired Genotypic Match")
rug(ALLpairs)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Histogram after removal (if any) of one member of each outlier high-identity genotypic pair", cex=0.65, line=1.3)
abline(v=c(0.0,0.0), lty=1)
abline(v=c(0.100,0.100), lty=2, lwd=2, col="red")
abline(v=c(0.150,0.150), lty=2, lwd=2, col="red")
abline(v=c(0.200,0.200), lty=2, lwd=2, col="red")
abline(v=c(0.375,0.375), lty=1, lwd=2, col="blue")
abline(v=c(0.550,0.550), lty=2, lwd=2, col="red")
abline(v=c(0.600,0.600), lty=2, lwd=2, col="red")
abline(v=c(0.650,0.650), lty=2, lwd=2, col="red")
abline(v=c(0.7,0.7), lty=3)
abline(v=c(0.8,0.8), lty=3)
abline(v=c(0.9,0.9), lty=3)
abline(v=c(1.0,1.0), lty=1)

# Lets now review cross object details after dropping the above individuals. 

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# Lets plot the genetic map again after dropping the duplicate individuals.

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[DUP Individuals GONE - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPs, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
# NOTE: Jimbo #-signed out below cmd - don't need the map at this point. 
#pull.map(OBJNAME, as.table=FALSE)

# Need to reset the graphics parameters back to their original values. 

par(mfrow=c(1,1), las=0)

################################################################################
################################################################################

# NEED TO CHECK FOR MARKER LOCI WITH ERRONEOUSLY SWITCHED A<>B ALLELE CODES. 

# Lets now evaluate the linkage between all possible pairs of markers. The 
# recombination fraction is an estimate the recombination frequency between each
# possible pair of markers and the LOD score for that rf is essentially a 
# statistical test of the NULL hypothesis that rf = 0.5. Keep in mind that as 
# the sample-based rf value drops from 0.5 to near 0.0, the LOD score increases. 
# That said, we actually want to identify marker pairs that have rf values that 
# are substantively GREATER than rf = 0.5 (I say substantively because sample
# estimates of rf can exceed rf = 0.5 simply because of sampling error!). It is
# possible that one member of a marker pair might have had its A/B alleles coded
# incorrectly (i.e., in inverse manner). Such marker pairs will exhibit rf values
# much larger than rf > 0.5 - how much larger depends on the closeness of the  
# marker pair members map positions. This error can be discerned with an R/qtl
# checkAlleles cmd and corrected with the switchAlleles cmd, but you have to be
# quite careful if one or more markers with switched alleles are located close 
# to each other. If you have plenty of markers, you might want to keep it simple
# by dropping one member of the marker pair, particularly if that marker shows
# up in a number of pairs with high rf values.  This is Jimbo's preference.  
# NOTE:  Some high rf values may simply be spurious estimates because a marker 
# or two (or a few) were only genotyped for a few individuals. 

# Let's graph the LOD score - rf value and determine whether any marker pairs 
# have rf values in excess of rf=0.5 and also have a high LOD score. 

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

rf <- pull.rf(OBJNAME, what="rf")
lod <- pull.rf(OBJNAME, what="lod")

# Ignore the warning message " -Running est.rf". Rqtl is notifying you that it
# is recalculating to ensure up-to-date marker-to-marker recombination values.
# Also ignore for now warning of "Alleles potentially switched at markers".
# We run a checkAlleles cmd below to formally deal with this warning.
 
date()

# NOTE: Above cmds may output a message "Alleles potentially switched at markers" 
# and if so, those flagged markers will subsequently be examined in below cmds. 

plot(as.numeric(rf), as.numeric(lod), xlab="Recombination fraction", ylab="LOD score")
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Graph of LOD score vs. rf value for all possible marker pairings in this population", cex=0.85, line=2.6)
mtext(side=3, "The horizontal red dotted lines denote LOD score values of 3.0 to 6.0", cex=0.85, line=1.6)
mtext(side=3, "Check for switched alleles in marker pairs with very high rf values & LOD Scores", cex=0.85, line=0.6)
abline(h=c(0,0), lty=1)
abline(h=c(3,3), lty=2, lwd=2, col="red")
abline(h=c(4,4), lty=2, lwd=2, col="red")
abline(h=c(5,5), lty=2, lwd=2, col="red")
abline(h=c(6,6), lty=2, lwd=2, col="red")
abline(v=0.50, lty=1, lwd=2, col="blue")
abline(v=0.65, lty=3)
abline(v=0.70, lty=3)
abline(v=0.75, lty=3)
abline(v=0.80, lty=3)
abline(v=0.85, lty=3)
abline(v=0.90, lty=3)
abline(v=0.95, lty=3)

# In a perfect world, all estimated rf values in a data set would be 0.5 or less
# but in a sample of a true 0.5 rf value, rf values above 0.5 will be obtained. 

# It is evident in the graph that some marker pairs have rf values >> 0.5, and 
# those pairs also have higher LOD scores than those at 0.5 - a diagnostic for 
# switched alleles. The rf values in the range of 0.5 < rf < 0.6 or greater may 
# just be sampling error deviates arising from imperfect allele segregation in 
# each marker member of an unlinked marker pair, if there is a large std dev for 
# the rf estimate (i.e., rf = 0.5 +/- 0.2). Keep in mind that the std dev around 
# the rf value will be larger the fewer the number of population individuals. 
# An exceptionally large rf value is quite likely due to switched A<>B alleles 
# at one member of the marker pair (it also could be just a "bad" marker per se). 

# NOTE: Scatter in the curve will arise if marker genotypes are missing, because
# rf values of 0.5 for such marker pairs can deviate from their true 0.5 value. 
# Two curves might display in the graph - one for the case where ALL individuals 
# were genotyped for chromosomal segment of closely linked markers, and another 
# for the case where only individuals selected for hi/low decile phenotype were 
# genotyped for markers (linked and unlinked) located all over the whole genome. 

#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#

# Lets formally check for switched alleles with the checkAlleles cmd. 

date()

checkAlleles(OBJNAME, threshold=3, verbose=TRUE)

# Ignore the warning message "First running est.rf". It is just a notification.

date()

#| Let's switch alleles for any above-identified marker for which switching works.
# If there are some markers identified, insert their names into the below cmd.
# If there are no markers to switch, then #-sign the below cmd out of play. 
# Sometimes you have to drop other member of an adjacently linked pair of markers.
#| If the switchAlleles cmd not needed or does not work, go to the drop.markers cmd. 

# Jimbo says that if you have a lot of markers, it is not worth the extra effort
# needed to deal with just a few switched allele markers. Jimbo just drops 'em! 

# Example:  switchAlleles(OBJNAME, c("Sxxxxx","Sxxxxx","Etc."))
#switchAlleles(OBJNAME, c("

# Below cmds will output marker pairs with high rf values. You will see these
# as points at the far right in the prior graph of the LOD score versus rf value
# In small populations, there are likely to be rf>0.75 arising from limited 
# genotyping per marker pair, and the below cmds help you identify those pairs. 
#| Jimbo created below cmd list to ID individuals in rf value > threshold nos. 

print(rf99 <- which(rf[,1:totmar(OBJNAME)]>0.99, arr.in=TRUE))
print(rf95 <- which(rf[,1:totmar(OBJNAME)]>0.95, arr.in=TRUE))
print(rf90 <- which(rf[,1:totmar(OBJNAME)]>0.90, arr.in=TRUE))
print(rf85 <- which(rf[,1:totmar(OBJNAME)]>0.85, arr.in=TRUE))
print(rf80 <- which(rf[,1:totmar(OBJNAME)]>0.80, arr.in=TRUE))
#print(rf75 <- which(rf[,1:totmar(OBJNAME)]>0.75, arr.in=TRUE))

#| A marker that repeatedly appears in many marker pairs with high rf values does 
# not have to be dropped if the markers it pairs up with are located on different
# chromosomes. HOWEVER, you should drop markers on the SAME chromosome that pair 
# up and preferably the marker member that occurs repeatedly. Such markers can be
# dropped by inserting them into drop.markers cmd (see example). If no markers 
# will be dropped, #-sign out of play all cmds in the below cmd set.  

totmar(OBJNAME)
# Example: OBJNAME <- drop.markers(OBJNAME, c("Sxxxxx","Sxxxxx","Sxxxxx"))
# Dropped a marker member of each marker rf pair that had an rf>0.85:
OBJNAME <- drop.markers(OBJNAME, c("S16923","S20774","S12895"))
# Dropped a marker member of each marker rf pair that had a lod>4.0:
# There were none.
totmar(OBJNAME)

# Confirm any (switchAlleles or drop.markers) cmd work with a new LOD vs. rf graph. 

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

rf <- pull.rf(OBJNAME, what="rf")
lod <- pull.rf(OBJNAME, what="lod")

# Ignore the warning messages about "Running est.rf". It is just a notification.

date()

plot(as.numeric(rf), as.numeric(lod), xlab="Recombination fraction", ylab="LOD score")
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Graph of LOD score vs. rf value for all possible marker pairs in this population", cex=0.85, line=2.6)
mtext(side=3, "The horizontal red dotted lines denote LOD score values of 3.0 to 6.0", cex=0.85, line=1.6)
mtext(side=3, "Graph reflects switchAlleles & drop.markers work (if any) done on very high rf values", cex=0.85, line=0.6)
abline(h=c(0,0), lty=1)
abline(h=c(3,3), lty=2, lwd=2, col="red")
abline(h=c(4,4), lty=2, lwd=2, col="red")
abline(h=c(5,5), lty=2, lwd=2, col="red")
abline(h=c(6,6), lty=2, lwd=2, col="red")
abline(v=0.50, lty=1, lwd=2, col="blue")
abline(v=0.65, lty=3)
abline(v=0.70, lty=3)
abline(v=0.75, lty=3)
abline(v=0.80, lty=3)
abline(v=0.85, lty=3)
abline(v=0.90, lty=3)
abline(v=0.95, lty=3)

# You should see in the above graph the results of your corrective work if you
# used the switchAlleles and/or drop.markers cmds. 

# With only a few total genotypes in this selectively genotyped F2 population, 
# sample estimates of true rf=0.5 values are expected to exceed the true values 
# by a wider margin than in a population with many, many total genotypes! 

#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#

################################################################################
################################################################################

# LETS NOW USE SOME R/qtl CMDS TO CHECK FOR "POTENTIAL" GENOTYPING ERRORS. 

# The below error LOD scores are calculated using the calc.errorlod() cmd. 
# One must assume a basal genotyping error rate (e.g., an error.prob of 0.001 
# tells R/qtl that our genotyping error rate is ONE per 1,000 genotypings). 

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

OBJNAME <- calc.errorlod(OBJNAME, error.prob=0.001, map.function=c("kosambi"))

date()

# The top.errorlod() cmd identifies "suspicious" marker genotypes that have a  
# large error LOD score. Our interest is primarily in those "suspects" that  
# have an error LOD score of 6 or more (as noted in below cutoff parameter). 

print(toperr <- top.errorlod(OBJNAME, cutoff=6, msg=TRUE))

#\ If there are many toperr candidates, edit-copy the list & edit-paste into an
# Excel Worksheet for convenience in identifying the markers to drop. See below.

#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#

#\ The toperr flagged marker genotypes are likely potential "genotyping errors". 
# Examine these with below plot.geno cmds (with an include.xo=FALSE argument to
# suppress some typical plot.geno XO output) to graph these potential "errors". 

date()

# Below R graph parameter cmd reduces axis number size for the plot.geno graphs. 

par(mfrow=c(1,1), las=0, cex.axis=0.75)

plot.geno(OBJNAME, chr=1, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=2, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=3, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=4, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=5, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=6, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=7, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=8, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=9, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=10, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=11, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=12, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=13, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=14, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=15, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=16, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=17, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=18, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=19, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)
plot.geno(OBJNAME, chr=20, ind=(countXO(OBJNAME) >= 1), cutoff=6, include.xo=FALSE)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Marker Genotype Colors:  A=white;  H=grey;  B-black", cex=0.85, line=1)
mtext(side=3, "Red-Boxed Marker Genotypes - Potential Genotyping Errors", cex=0.85, line=0.1)

# Need to reset the graphics parameters back to their original values.  

par(mfrow=c(1,1), las=0, cex.axis=1)

#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#

#\ Jimbo next dropped markers that had THREE or more appearances in toperr list. 
# Created a sorted list of markers (to be dropped) in the P1022toperr.xls file.
# See that file for details as to the markers selected for below drop.markers cmd. 

date()

totmar(OBJNAME)

# Below R code was provided by Karl Broman.
# The below table cmd outputs the SNPs identified in the above toperr cmd. 
table(toperr[,3])
# Jimbo set a threshold of 3 or more appearances in toperr list as too many.
tab <-table(toperr[,3])
# The toperr multi-appearing SNPs are next put into a todrop object & printed.
todrop <- names(tab[tab>2])
todrop
# Manually found/dropped these SNPs before; now it is automatic w/ Karl's R code!
# Example: # There were none.
OBJNAME <- drop.markers(OBJNAME, c(todrop))

totmar(OBJNAME) 

# The calc.errorlod cmd must be rerun whenever markers are dropped (see above). 
# Jimbo to #-sign out the below two cmds if not needed (i.e., no marker drops).

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

OBJNAME <- calc.errorlod(OBJNAME, error.prob=0.001, map.function=c("kosambi"))
print(toperr <- top.errorlod(OBJNAME, cutoff=6, msg=TRUE))

date()

# The above 2nd calc.error cmd reveals markers you did not drop after 1st cmd. 
# One can CHANGE the above (erroneous?) A, B, or H genotype codes to NA codes 
# (R/qtl treats NA codes as missing genotypes) by using the below set of cmds. 
# Note that Jimbo used a new cross object name OBJNAME.clean for this purpose. 

#\ Jimbo to #-sign out the below cmds when not needed (i.e., no marker drops).

OBJNAME.clean <- OBJNAME
for(i in 1:nrow(toperr)) {
  chr <- toperr$chr[i]
  id <- toperr$id[i]
  mar <- toperr$marker[i]
  OBJNAME.clean$geno[[chr]]$data[OBJNAME$pheno$ID==id, mar] <- NA
}

# Note the capitalized ID in OBJNAME$phenot$ID (> a cap ID was used in *.csv). 
# Now let's re-calculate & print the error lod for the "cleaned" cross object. 

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

OBJNAME.clean <- calc.errorlod(OBJNAME.clean, error.prob=0.001, map.function=c("kosambi"))
print(toperr <- top.errorlod(OBJNAME.clean, cutoff=6, msg=TRUE))
# There should be no output from this cmd if the above "clean" cmd set worked. 

date()

# Now that suspicious "genotyping errors" are gone, let's do a data summary. 

OBJNAME <- OBJNAME.clean
summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#

################################################################################
################################################################################

# THE HYTEN ET AL. (2010) MARKER ORDER IS USED HERE BECAUSE THAT ORDER IS LIKELY 
# THE "BEST" ORDER GENERATED TO DATE (MORE MARKERS AND INDIVIDUALS WERE USED TO 
# CREATE VER. 4.0 MAP). NOTE THAT THE BELOW GRAPHS INDICATE THAT THE F2 MARKER 
# ORDER DID NOT ALWAYS MATCH THE VER. 4.0 MAP ORDER FOR CLOSELY LINKED MARKERS, 
# BECAUSE OF SMALL-POPULATION NUMBER SAMPLING ERROR IN THE GENOTYPING! 

date()

# NOTE: Ignore warning messages "Running est.fr" generated by the below cmds.

# Lets look at the marker pairwise rfs and LOD scores on a 2-chr/graph basis.
# If a marker in your .csv file was assigned to an incorrect chromosome, you
# will be able to see that error in this graph.

plot.rf(OBJNAME)
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)

# Lets look at the marker pairwise rfs and LOD scores on a 2-chr/graph basis.
# Ideally, one should see red pixelation tightly clustered along diagonal, but
# with selective genotyping, the small sample of genotyped individuals can lead
# to sampling error in a sample-estimated order of closely linked markers.
 
plot.rf(OBJNAME,chr=c(1:2))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(3:4))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(5:6))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(7:8))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(9:10))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(11:12))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(13:14))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(15:16))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(17:18))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
plot.rf(OBJNAME,chr=c(19:20))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)

################################################################################

# LET'S NOW REVISIT THE MARKER GENOTYPE SEGREGATION ISSUE ON AN INDIVIDUAL BASIS. 

date()

#= Just as we expect the markers to segregate 1A:2H:1B in the F2 Pop, we expect  
# F2s to have 1A:2H:1B genotype frequencies in approximately the same proportion.
# The next cmds generate graphs of where the F2s fall in three graphs - the left
# graph has a vertical axis of A genotypic frequency, the middle graph has a 
# vertical axis of H genotypic frequency, and finally the right graph has a 
# a vertical axis of B genotypic frequency. (NOTE: See Broman's tutorial for
# an F2 pop where he shows three graphs for A:H:B. For an RIL, only two occur)
#= (NOTE: For RIL pops, change 1:3 to 1:2 (3 places) and remove ,"AB", in main).

gfreq <- apply(g, 1, function(a) table(factor(a, levels=1:3)))
gfreq <- t(t(gfreq) / colSums(gfreq))
# Need to set the graphics parameters allow three adjacent graphs on the page. 
par(mfrow=c(1,3), las=1)
for(i in 1:3) plot(gfreq[i,], ylab="Genotype Frequency", main=c("AA", "AB", "BB")[i], sub=c("P1022", "P1022", "P1022")[i], ylim=c(0,1))

# OK, can now see in two graphs that mean frequencies are:  1A : 2H : 1B. 

# Need to reset the graphics parameters back to their original values. 

par(mfrow=c(1,1), las=0)

################################################################################

# WHAT ABOUT MARKER GENOTYPE FREQUENCY VARIATION ENTIRE GENOME (CHR 1 > 20)? 

date()

# Lets display F2 AA:AB:BB seg by marker across the 20 soybean chromosomes. 

gtAll <- geno.table(OBJNAME, scanone.output=TRUE)
par(mfrow=c(1,1), las=0) 
# Use the above par cmd to get two graphs on separate pages (better readability). 
#par(mfrow=c(2,1)) 
# OR Use this above par cmd to put the two graphs on the same page. 
plot(gtAll, ylab=expression(paste(-log[10], " P-value")))
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "-log10 P-values for tests of 1A:2H:1B segregation for Chrs 1>20", line=2.0)
mtext(side=3, "See next graph for 1A:2H:1B segregational distribution across Chrs 1>20", line=0.6)
plot(gtAll, lod=3:5, col=c("red","black","blue"), ylab="Genotype frequency")
mtext(side=3, "F2 Population P1022 X Jim", line=3.6)
mtext(side=3, "Chr 1>20 Marker Genotype AA(red):AB(black):BB(blue) Frequencies", line=2.0)
mtext(side=3, "See prior graph for marker 1A:2H:1B segregational dev across Chrs 1>20", line=0.6)
abline(h=c(0.25, 0.50), lty=1, lwd=2, col="black")

# The first graph displays -log10 P-values for tests of 1:2:1 segregation at 
# each marker located on Chrs 1 to 20. The second graph displays AA:AB:BB 
# genotype frequencies for each marker in the genome (black=AA blue=AB red=BB). 
# Segregation variance (deviation from 0.5) is greater with smaller F2 pops. 

# NOTE: WHAT CAN BE DONE WITH MARKERS WITH LOD SCORE SPIKES IN FIRST GRAPH?  
# These may be markers retained (not exceeding Bonferroni corrected X2 Prob) 
# that have marker genotype seg NOT IN SYNCH with adjacent markers, SO CHECK  
# THE MARKER SEGREGATION data output for these markers (maybe drop them?) 
# Use below cmd for that purpose, but Jimbo chose to #-sign it out of play.

#totmar(OBJNAME)
#drop.markers(OBJNAME, markers=c("Sxxxxx","Sxxxxx","Etc."))
#totmar(OBJNAME)

################################################################################
################################################################################

# NOW THAT ALL ERROR CHECKING/CORRECTION IS DONE, LET'S CREATE A FINALIZED MAP. 

date()

plot.map(OBJNAME, horizontal=FALSE, shift=FALSE, show.marker.names=FALSE, alternate.chrid=FALSE, ylab="Map Distance (cM)", ylim=c(160,0))
mtext(side=3, "F2 Population P1022 X Jim", line=1.0)
mtext(side=3, "[FINALIZED MAP - Using Hyten et al. (2010) RIL-based Marker Orders/Distances(Kosambi)]", cex=0.65, line=0.2)
# Due to far fewer SNPS, the F2 genetic map lacks the coverage of Hyten et al. 
# so we put into this genetic map the Hyten et al start & end chr positions. 
# These were assigned to the variables y and z above. 
points(x,y, cex=0.75, pch=1)
points(x,z, cex=0.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0, 0))
# The pull.map cmd prints out current chromosomal SNP marker map positions. 
# These SNP marker positions are those specified by Hyten et al. (2010). 
pull.map(OBJNAME, as.table=FALSE)

################################################################################

# LET'S ALSO CREATE A FINALIZED MAP WITH MARKER NAME LABELS (20 CHRs, 10/graph). 
# We need to make the marker name font size small but still (barely) readable. 

date()

# Below par cmd will decrease the text size of all text (i.e., cex=0.5) but add 
# parameters to bring the text size of axis labels and titles back to normal. 

par(cex=0.5, cex.axis=1.5, cex.lab=2, cex.main=2, mar=c(6.1, 6.1, 6.1, 1.1))

#*******************************************************************************

# Below we generate a graph with marker names for chromosomes 01-10.

# Below are setup x,y,z variables for chr # (x) and start (y) & end (z) lengths, 
# where z = Hyten et al. Chr 1-10 terminal lengths (Kosambi cM). 
x <- c(1,3,5,7,9,11,13,15,17,19) # use these numbers due to 2-unit chr spacing.
y <- c(0,0,0,0,0,0,0,0,0,0) # use these number as begin circle points.
z <- c(98.41,140.63,99.51,112.32,86.75,136.51,135.15,146.67,99.60,132.89) 
# Next variable is used for text length labeling at Hyten et al. chromosome ends. 
# NOTE: The quotes were used to put a space in front of shorter 2-digit lengths. 
chrlenglab <- c(" 98",141,100,112," 87",137,135,147,100,133)
# Next variable centers the 3-digit length value at the bottom of the chr ends). 
x1 <- c(0.5,2.5,4.5,6.5,8.5,10.5,12.5,14.5,16.5,18.5) # again, 2-unit spacing.
# Jimbo added 6 digits to each z var to print chr leng values below the chr ends. 
z1 <- c(104.41,146.63,105.51,118.32,92.75,142.51,141.15,152.67,105.60,138.89)

plot.map(OBJNAME, chr=c(1:10), horizontal=FALSE, shift=FALSE, show.marker.names=TRUE, alternate.chrid=FALSE, ylim=c(150,0))
mtext(side=3, "F2 Population P1022 X Jim - Chromosomes  1 to 10", line=0.25)
#mtext(side=3, "[Using Hyten et al. (2010) Marker Orders & Kosambi Map Distances]", cex=0.65, line=0.2)
# This genetic map displays the Hyten et al start & end chr 01-10 positions. 
# These were assigned to the variables x, y, z (see above section). 
points(x,y, cex=1.75, pch=1)
points(x,z, cex=1.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 1.25, m = c(0, 0))

#*******************************************************************************

# Below we generate a graph with marker names for chromosomes 11-10.

# Below are setup x,y,z variables for chr # (x) and start (y) & end (z) lengths, 
# where z = Hyten et al. Chr 1-10 terminal lengths (Kosambi cM). 
x <- c(1,3,5,7,9,11,13,15,17,19) # use these numbers due to 2-unit chr spacing.
y <- c(0,0,0,0,0,0,0,0,0,0) # use these number as begin circle points.
z <- c(124.24,120.50,120.03,108.18, 99.88, 92.27,119.19,107.09,101.14,112.77) 
# Next variable is used for text length labeling at Hyten et al. chromosome ends. 
# NOTE: The quotes were used to put a space in front of shorter 2-digit lengths. 
chrlenglab <- c(124,121,120,108,100," 92",119,107,101,113)
# Next variable centers the 3-digit length value at the bottom of the chr ends). 
x1 <- c(0.5,2.5,4.5,6.5,8.5,10.5,12.5,14.5,16.5,18.5) # again, 2-unit spacing.
# Jimbo added 6 digits to each z var to print chr leng values below the chr ends. 
z1 <- c(130.24,126.50,126.03,114.18,105.88,98.27,125.19,113.09,107.14,118.77)

plot.map(OBJNAME, chr=c(11:20), horizontal=FALSE, shift=FALSE, show.marker.names=TRUE, alternate.chrid=FALSE, ylim=c(150,0))
mtext(side=3, "F2 Population P1022 X Jim - Chromosomes 11 to 20", line=0.25)
#mtext(side=3, "[Using Hyten et al. (2010) Marker Orders & Kosambi Map Distances]", cex=0.65, line=0.2)
# This genetic map displays the Hyten et al start & end chr 01-10 positions. 
# These were assigned to the variables x, y, z (see above section). 
points(x,y, cex=1.75, pch=1)
points(x,z, cex=1.75, pch=1)
# The textxy cmd from library(calibrate) allows data labels on the graph. 
textxy(x1, z1, chrlenglab, cx = 1.25, dcol = "black", m = c(0, 0))

# Need to reset the graphics parameters back to their original values. 

par(mar=c(5.1,5.1,5.1,3.1),mfrow=c(1,1),las=0,cex=1,cex.axis=1,cex.lab=1,cex.main=1,cex.sub=1,font=1,font.main=2,font.sub=2,font.axis=2,font.lab=2)

################################################################################

# LET'S OUTPUT A CORRECTED/REVISED *.CSV FILE OF THIS DATA SET FOR FUTURE USE. 

# Let's write the now "cleaned-up" data set to a new "nameCHKD.csv" file so that 
# we can retain (in that new *.csv) all of the corrections/changes/etc. that were 
# accomplished above and use that file for a subsequent QTL detection analysis. 

#= Below write.cross cmd is used for either an F2 data set or an RIL data set. 
# Ignore the warning messages (if any) notifying you about the backup process.

date()

write.cross(OBJNAME, format=c("csvr"), filestem="C:/WorkAllP/P1022GenPheCHKD", digits=NULL)

date()

# NOTE: The outputted file has genotypic codes in an F2 format AA:BB:BB:-, or in 
# an RIL format AA:BB:-. So if the outputted *.csv file is to be accessed later
# with the read.cross cmds (now #-signed out below), be sure to use a correct 
# genotypes option (as noted below) in your later F2 or the RIL read.cross cmd. 

#= for a F2 pop: 
#P1001CHKD <- read.cross(format=c("csvr"), dir="C:/WorkAllP", file="P1022genpheCHKD.csv", na.strings=c("-","NA"), genotypes=c("A","H","B"), alleles=c("A","B"), error.prob=0.001, map.function=c("kosambi"))
#= for a RIL pop:
# Same as F2 but drop the "H" in the read.cross cmd.
################################################################################

# LET'S ALSO NOW OUTPUT TWO FILES THAT CAN BE USED WITH WinQTL Cartographer. 

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

write.cross(OBJNAME, format=c("qtlcart"), filestem="P1022", chr=c(1:20), digits=NULL)

date()

# QTL Cartographer format: Data will be written to two files with below names. 
# If filestem="P1022", then "P1022.cro" contains the genotype and phenotype data, 
# and P1022.map" contains the genetic map information (markers distances).
# Note that cross types are converted to QTL Cartographer cross types as follows: 
# riself to RF1, risib to RF2, bc to B1 and f2 to RF2.

# NOTE: The *.cro and *.map files are files you input into WinQTL Cartographer!

################################################################################
################################################################################
################################################################################

# LET'S NOW PROCEEED TO THE PHENOTYPE ANALYSES AND THENCE THE R/qtl QTL ANALYSES. 

date()

# The below cmd will list all of the phenotype names in the cross object.

phenames(OBJNAME)

################################################################################

#= LIST OF PHENOTYPES MEASURED IN THE 48 SG F2 POPULATIONS (Bee Phansak PhD data)

# The F2.3 progeny phenotypes listed below have a parenthetic no. corresponding 
# to the rank order of the phenotypes in the Mapmaker *.raw data files for all 
# populations. The parenthetic number is also an R/qtl pheno.col=number as well:
# (1) ID - provides F2 plant ID# (some missing F2 plants, so not consecutive).
# (2) Rno - initial consecutive numbering system (F2 plant number - from 1 > X).
# (3) Pro1 - a one-replicate NIT measurement of F2.3 progeny seed protein. 
# (4) Oil1 - same as (2), but for seed oil. 
# (5) H2O1 - same as (2), but for seed moisture (H2O) (informational only). 
# (6) Pro2Q - a replicate two measurement of seed protein, but performed ONLY  
#     on F2.3 progeny in the extreme quintiles of the 1-replicate distribution. 
#     (i.e., only 2/5 of the total F2.3 progeny will have a rep two phenotype).
# (7) Oil2Q - same as (6), but for seed oil. 
# (8) H2O2Q - same as (6), but for seed moisture (H2O) (informational only). 
# (9) ProMQ - a two-replicate MEAN seed protein value for each F2.3 progeny 
#     in the extreme quintiles. Only 2/5 F2 of will have a 2-rep mean phenotype. 
# (10) OilMQ - same as (9), but for seed oil. 
# (11) H2OMQ - same as (9), but for seed moisture (H2O) (informational only). 
# (12) Pro - a dataset mix of 2-rep MEAN seed protein phenotyping (extreme
#      quintile progeny only), and one-rep phenotyping (all other progeny).
# (13) Oil - same as (12), but for seed oil. 
# (14) H2O - same as (12), but for seed moisture (H2O) (informational only).
# (15) SWgt - Weight of harvested F2.3 progeny seed - NOT measured in some 
#      populations, so in their .csv files, all plants have missing (-) values. 

################################################################################
################################################################################

# MISSING GENOTYPES - A GRAPH-BASED VISUALIZATION OF SELECTIVE GENOTYPING IMPACT

date()

# Let's take a look at individuals with & without missing marker genotype data. 

# The next cmd uses black pixelation to denote missing marker genotypes for 
# the F2 Plant ID Numbers 1, 2, 3, .... last (arrayed upward on the left axis). 
# These ID numbers are not consecutive (due to F2 plant losses). 
 
plot.missing(OBJNAME, reorder=FALSE, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Black Pixelation Denotes Missing Genotypes", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2 ID Number)", cex=0.7, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

# The next cmd uses colored pixelation to indicate the genotypes of A (red), 
# H (blue) and B (green) for each F2 ID. White pixelation is indicative of a 
# marker in an individual for which there is no genotype code (i.e., dash). 

geno.image(OBJNAME, ylab="", reorder=FALSE, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Pixelation:   Red=AA-LoPro Blue=AB Green=BB-HiPro White=Missing", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2 ID Number)", cex=0.65, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

# Let's repeat above cmds but this time for the RANKED F2.3 Seed Pro Rep 1 
# values (arrayed from low values to high values on the left axis). Except 
# for missing plant or missing DNA anomalies, the top and bottom sections of 
# graphs should be pixelated, as only the hi/lo phenotypes were genotyped. 

plot.missing(OBJNAME, reorder=3, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Black Pixelation Denotes Missing Genotypes", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2.3 Progeny Rep 1 Seed Pro Value)", cex=0.7, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

geno.image(OBJNAME, ylab="", reorder=3, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Pixelation:   Red=AA-LoPro Blue=AB Green=BB-HiPro White=Missing", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2.3 Progeny Rep 1 Seed Pro Value)", cex=0.65, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

# Let's repeat above cmds but this time for the ranked F2.3 Seed Pro Rep 1 
# values (arrayed from low values to high values on the left axis). Except 
# for missing plant or missing DNA anomalies, the top and bottom sections of 
# graphs should be pixelated, as only the hi/lo phenotypes were genotyped. 

plot.missing(OBJNAME, reorder=12, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Black Pixelation Denotes Missing Genotypes", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2.3 Progeny Combined Seed Pro Value)", cex=0.7, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

geno.image(OBJNAME, ylab="", reorder=12, alternate.chrid=TRUE, main=NULL)
title(main="Chromosome", font.main=1, line=2, sub="Pixelation:   Red=AA-LoPro Blue=AB Green=BB-HiPro White=Missing", font.sub=2, cex.sub=0.85)
mtext("(Ranked by F2.3 Progeny Combined Seed Pro Value)", cex=0.65, side=2, line=2.4, las=3)
mtext("F2 Population P1022 X Jim", side=3, line=3, las=1)

################################################################################

# UNIVARIATE STATISTICS AND NORMALITY OF THE DISTRIBUTIONS 
# TABULAR DATA BELOW AND A GRAPH-BASED VISUALIZATION IN R GRAPHICS WINDOW> 

# We will be conducting QTL analysis on the protein & oil values, so we need to 
# take a look at the normality of these pro oil distributions. We will also 
# calculate the mean, stdev, max, min, skewness, and kurtosis, where 
# Skewness (normal distributions are symmetric) can be right or left: 
#   If negative > left tail is longer; bulk of values fall to right of the mean. 
#   If positive > right tail is longer; bulk of values fall to left of the mean. 
# Kurtosis: (normal distributions are mesokurtic, i.e., normal peak and tails): 
#   If positive > leptokurtic (an acutely higher & sharper peak with fat tails). 
#   If negative > platykurtic (a lower & wider/rounded peak with thin tails). 

# To get the statistics, cmds will be needed from the several R packages 
# that were installed/loaded, such as ("stats"), ("moments"), and ("psych").

# Below is sample R code for superimposing a normal curve on actual data that 
# Jimbo modified to make it work on the F2 pop seed protein and oil distributions. 

# x<-rnorm(150)
# h<-hist(x,breaks=15)
# xhist<-c(min(h$breaks),h$breaks)
# yhist<-c(0,h$density,0)
# xfit<-seq(min(x),max(x),length=40)
# yfit<-dnorm(xfit,mean=mean(x),sd=sd(x))
# plot(xhist,yhist,type="s",ylim=c(0,max(yhist,yfit)))
# lines(xfit,yfit)

################################################################################

# BELOW ARE THE MAIN STATISTICS AND NORMALITY OF THE SEED PRO/OIL DISTRIBUTIONS. 

date()

# Here we present, for each phenotype, univariate statistics, plus skewness & 
# kurtosis, and two histogram distributions - one with a normality curve/test). 


# First, let's look at the F2.3 Seed Pro Distribution (All Progeny - Rep 1 Data). 

OBJNAME$pheno[,3]
describe(OBJNAME$pheno[,3], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Pro1 <- OBJNAME$pheno[,3]
mean(F23Pro1, na.rm=TRUE)
var(F23Pro1, na.rm=TRUE)
sd(F23Pro1, na.rm=TRUE)
min(F23Pro1, na.rm=TRUE)
max(F23Pro1, na.rm=TRUE)
skewness(F23Pro1, na.rm=TRUE)
kurtosis(F23Pro1, na.rm=TRUE)
shapiro.test(F23Pro1)
SWpro <- (shapiro.test(F23Pro1))
SWprop <- round(SWpro$p.value, digits=5)
h <- hist(F23Pro1, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Pro Distrib (All Progeny - Rep 1 Data)")
title(sub="%", line=2)

Protein <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Pro1,na.rm=TRUE), max(F23Pro1, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Pro1, na.rm=TRUE), sd=sd(F23Pro1, na.rm=TRUE))
plot(Protein, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Pro Distrib (All Progeny - Rep 1 Data)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Pro1, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Pro1, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWprop, col="red")
title(sub="%", line=2)


# Second, let's look at the F2.3 Seed Oil Distribution (All Progeny - Rep 1 Data). 

OBJNAME$pheno[,4]
describe(OBJNAME$pheno[,4], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Oil1 <- OBJNAME$pheno[,4]
mean(F23Oil1, na.rm=TRUE)
var(F23Oil1, na.rm=TRUE)
sd(F23Oil1, na.rm=TRUE)
min(F23Oil1, na.rm=TRUE)
max(F23Oil1, na.rm=TRUE)
skewness(F23Oil1, na.rm=TRUE)
kurtosis(F23Oil1, na.rm=TRUE)
shapiro.test(F23Oil1)
SWoil <- (shapiro.test(F23Oil1))
SWoilo <- round(SWoil$p.value, digits=5)
h <- hist(F23Oil1, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Oil Distrib (All Progeny - Rep 1 Data)")
title(sub="%", line=2)

Oil <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Oil1,na.rm=TRUE), max(F23Oil1, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Oil1, na.rm=TRUE), sd=sd(F23Oil1, na.rm=TRUE))
plot(Oil, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Oil Distrib (All Progeny - Rep 1 Data)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Oil1, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Oil1, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWoilo, col="red")
title(sub="%", line=2)


# Third, let's look at the F2.3 Seed Pro Distribution (Extreme Quintile Progeny Only - Rep 2 Data). 

OBJNAME$pheno[,6]
describe(OBJNAME$pheno[,6], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Pro2 <- OBJNAME$pheno[,6]
mean(F23Pro2, na.rm=TRUE)
var(F23Pro2, na.rm=TRUE)
sd(F23Pro2, na.rm=TRUE)
min(F23Pro2, na.rm=TRUE)
max(F23Pro2, na.rm=TRUE)
skewness(F23Pro2, na.rm=TRUE)
kurtosis(F23Pro2, na.rm=TRUE)
shapiro.test(F23Pro2)
SWpro <- (shapiro.test(F23Pro2))
SWprop <- round(SWpro$p.value, digits=5)
h <- hist(F23Pro2, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Pro Distrib (Ext Quint Prog Only - Rep 2 Data)")
title(sub="%", line=2)

Protein <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Pro2,na.rm=TRUE), max(F23Pro2, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Pro2, na.rm=TRUE), sd=sd(F23Pro2, na.rm=TRUE))
plot(Protein, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Pro Distrib (Ext Quint Prog Only - Rep 2 Data)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Pro2, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Pro2, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWprop, col="red")
title(sub="%", line=2)


# Fourth, let's look at the F2.3 Seed Oil Distribution (Extreme Quintile Progeny Only - Rep 2 Data). 

OBJNAME$pheno[,7]
describe(OBJNAME$pheno[,7], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Oil2 <- OBJNAME$pheno[,7]
mean(F23Oil2, na.rm=TRUE)
var(F23Oil2, na.rm=TRUE)
sd(F23Oil2, na.rm=TRUE)
min(F23Oil2, na.rm=TRUE)
max(F23Oil2, na.rm=TRUE)
skewness(F23Oil2, na.rm=TRUE)
kurtosis(F23Oil2, na.rm=TRUE)
shapiro.test(F23Oil2)
SWoil <- (shapiro.test(F23Oil2))
SWoilo <- round(SWoil$p.value, digits=5)
h <- hist(F23Oil2, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Oil Distrib (Ext Quint Prog Only - Rep 2 Data)")
title(sub="%", line=2)

Oil <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Oil2,na.rm=TRUE), max(F23Oil2, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Oil2, na.rm=TRUE), sd=sd(F23Oil2, na.rm=TRUE))
plot(Oil, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Oil Distrib (Ext Quint Prog Only - Rep 2 Data)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Oil2, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Oil2, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWoilo, col="red")
title(sub="%", line=2)


# Fifth, let's look at the F2.3 Seed Pro Distribution (Extreme Quintile Progeny Only - 2-Rep Means). 

OBJNAME$pheno[,9]
describe(OBJNAME$pheno[,9], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23ProM <- OBJNAME$pheno[,9]
mean(F23ProM, na.rm=TRUE)
var(F23ProM, na.rm=TRUE)
sd(F23ProM, na.rm=TRUE)
min(F23ProM, na.rm=TRUE)
max(F23ProM, na.rm=TRUE)
skewness(F23ProM, na.rm=TRUE)
kurtosis(F23ProM, na.rm=TRUE)
shapiro.test(F23ProM)
SWpro <- (shapiro.test(F23ProM))
SWprop <- round(SWpro$p.value, digits=5)
h <- hist(F23ProM, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Pro Distrib (Ext Quint Prog Only - 2-Rep Means)")
title(sub="%", line=2)

Protein <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23ProM,na.rm=TRUE), max(F23ProM, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23ProM, na.rm=TRUE), sd=sd(F23ProM, na.rm=TRUE))
plot(Protein, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Pro Distrib (Ext Quint Prog Only - 2-Rep Means)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23ProM, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23ProM, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWprop, col="red")
title(sub="%", line=2)


# Sixth, let's look at the F2.3 Seed Oil Distribution (Extreme Quintile Progeny Only - 2-Rep Means). 

OBJNAME$pheno[,10]
describe(OBJNAME$pheno[,10], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23OilM <- OBJNAME$pheno[,10]
mean(F23OilM, na.rm=TRUE)
var(F23OilM, na.rm=TRUE)
sd(F23OilM, na.rm=TRUE)
min(F23OilM, na.rm=TRUE)
max(F23OilM, na.rm=TRUE)
skewness(F23OilM, na.rm=TRUE)
kurtosis(F23OilM, na.rm=TRUE)
shapiro.test(F23OilM)
SWoil <- (shapiro.test(F23OilM))
SWoilo <- round(SWoil$p.value, digits=5)
h <- hist(F23OilM, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Oil Distrib (Ext Quint Prog Only - 2-Rep Means)")
title(sub="%", line=2)

Oil <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23OilM,na.rm=TRUE), max(F23OilM, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23OilM, na.rm=TRUE), sd=sd(F23OilM, na.rm=TRUE))
plot(Oil, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Oil Distrib (Ext Quint Prog Only - 2-Rep Means)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23OilM, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23OilM, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWoilo, col="red")
title(sub="%", line=2)


# Seventh, let's look at the F2.3 Ext-Q Means / Mid-Q Sgl Rep Seed Pro Distribution. 

OBJNAME$pheno[,12]
describe(OBJNAME$pheno[,12], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Pro1r <- OBJNAME$pheno[,12]
mean(F23Pro1r, na.rm=TRUE)
var(F23Pro1r, na.rm=TRUE)
sd(F23Pro1r, na.rm=TRUE)
min(F23Pro1r, na.rm=TRUE)
max(F23Pro1r, na.rm=TRUE)
skewness(F23Pro1r, na.rm=TRUE)
kurtosis(F23Pro1r, na.rm=TRUE)
shapiro.test(F23Pro1r)
SWpro <- (shapiro.test(F23Pro1r))
SWprop <- round(SWpro$p.value, digits=5)
h <- hist(F23Pro1r, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Pro Distrib (All Prog - Ext-Q Mean & Mid-Q Rep 1)")
title(sub="%", line=2)

Protein <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Pro1r,na.rm=TRUE), max(F23Pro1r, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Pro1r, na.rm=TRUE), sd=sd(F23Pro1r, na.rm=TRUE))
plot(Protein, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Pro Distrib (All Prog - Ext-Q Mean & Mid-Q Rep 1)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Pro1r, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Pro1r, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWprop, col="red")
title(sub="%", line=2)


# Eighth, let's look at the F2.3 Seed Oil Distribution (All Progeny - Ext-Q Mean & Mid-Q Rep 1 Data). 

OBJNAME$pheno[,13]
describe(OBJNAME$pheno[,13], na.rm = TRUE, interp=FALSE, skew = TRUE, ranges = TRUE)
F23Oil1r <- OBJNAME$pheno[,13]
mean(F23Oil1r, na.rm=TRUE)
var(F23Oil1r, na.rm=TRUE)
sd(F23Oil1r, na.rm=TRUE)
min(F23Oil1r, na.rm=TRUE)
max(F23Oil1r, na.rm=TRUE)
skewness(F23Oil1r, na.rm=TRUE)
kurtosis(F23Oil1r, na.rm=TRUE)
shapiro.test(F23Oil1r)
SWoil <- (shapiro.test(F23Oil1r))
SWoilo <- round(SWoil$p.value, digits=5)
h <- hist(F23Oil1r, breaks=15, main=NULL, las=1)
title(main="P1022 F2.3 Seed Oil Distrib (All Prog - Ext-Q Mean & Mid-Q Rep 1)")
title(sub="%", line=2)

Oil <- c(min(h$breaks), h$breaks)
Frequency <-c (0,h$density,0)
xfit <- seq(min(F23Oil1r, na.rm=TRUE), max(F23Oil1r, na.rm=TRUE), length=40)
yfit <- dnorm(xfit, mean=mean(F23Oil1r, na.rm=TRUE), sd=sd(F23Oil1r, na.rm=TRUE))
plot(Oil, Frequency, type="s", ylim=c(0,max(Frequency, yfit)), main="P1022 F2.3 Seed Oil Distrib (All Prog - Ext-Q Mean & Mid-Q Rep 1)", las=1)
lines(xfit, yfit, col="red")
text((mean(F23Oil1r, na.rm=TRUE)),0, pos=3, adj=c(0.5,0.5), labels="Shapiro-Wilk Normality Test P-value", col="red")
text((mean(F23Oil1r, na.rm=TRUE)),0, adj=c(0.5,0.5), labels=SWoilo, col="red")
title(sub="%", line=2)


################################################################################

# PAIRED PHENOTYPES - A GRAPH-BASED VIEW OF THE PHENOTYPIC CORRELATIONS.

date()

# The next cmds produce paired graphs of one phenotype versus another, 
# showing all ten on one page, or relevant groups of four or three on one page. 
# These graphs are useful for visional evaluations of numerical correlations. 

pairs(OBJNAME$pheno[3:14])
mtext("P1022 F2.3 Progeny - All Progeny - All Phenotypes", line=3.5)

pairs(OBJNAME$pheno[3:5])
mtext("P1022 F2.3 Progeny - All Progeny - Replicate One Phenotypes", line=3.5)

pairs(OBJNAME$pheno[6:8])
mtext("P1022 F2.3 Progeny - Extreme Quintile Progeny Only - Rep 2 Data", line=3.5)

pairs(OBJNAME$pheno[9:11])
mtext("P1022 F2.3 Progeny - Extreme Quintile Progeny Only - 2-Rep Means", line=3.5)

pairs(OBJNAME$pheno[12:14])
mtext("P1022 F2.3 Progeny - All Progeny - Ext-Q Mean & Mid-Q Rep 1 Data", line=3.5)

# Let's take a look at the phenotypic correlation values between all pairs of 
# phenotypes, especially those between the 1st rep and 2nd rep values measured 
# for seed protein (ditto for seed oil), and between protein and oil, as well. 

cor(OBJNAME$pheno[3:14], use = "pairwise.complete.obs", method="pearson")

################################################################################
################################################################################

## LOD SCAN - LOD SCAN - LOD SCAN - LOD SCAN - LOD SCAN - LOD SCAN - LOD SCAN ##

# BELOW ARE LOD SCAN COMMANDS THAT DETECT QTLs IN F2.3 PROGENY PHENOTYPE DATA. 

# WE DO A SIMPLE (mr) ANALYSIS FIRST, THEN DO TWO INTERVAL ANALYSES (em & imp). 

# Let's summarize what we have so far for this cross object (i.e., OBJNAME)

summary(OBJNAME, threshold=3, df=TRUE)
jittermap(OBJNAME, amount=1e-6)

# To ensure a good lod scan graph layout, use the below graph parameter cmd. 

par(mar=c(5.1,5.1,5.1,3.1),font=2,font.main=2,font.sub=2,font.axis=2,font.lab=2)

date()

################################################################################
# SINGLE MARKER REGRESSION ANALYSIS 
# Requires about 1 second to run the cmds here on Jimbo's 4GB RAM laptop.
# MR useful for initial look, but Interval Analysis is best for finalizing QTLs. 
################################################################################

# First, do SEED PROTEIN (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmd for a GENOME SCAN and PLOT:
outpro.mr <- scanone(OBJNAME, pheno.col=3, method="mr")
summary(outpro.mr)
summary(outpro.mr, threshold=3)
max(outpro.mr)
plot(outpro.mr, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein", ylab="LOD Score", ylim=c(0,5))
mtext("Single Marker Analysis (Regression Method)", side=3, line=1, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(4.43,4.43), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# Second, do SEED OIL (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmd for a GENOME SCAN and PLOT:
outoil.mr <- scanone(OBJNAME, pheno.col=4, method="mr")
summary(outoil.mr)
summary(outoil.mr, threshold=3)
max(outoil.mr)
plot(outoil.mr, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil", ylab="LOD Score", ylim=c(0,5))
mtext("Single Marker Analysis (Regression Method)", side=3, line=1, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(4.36,4.36), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

################################################################################
# INTERVAL ANALYSIS EM (Maximum Likelihood Method using the EM Algorithm) 
# Requires about 13 seconds to run the cmds here on Jimbo's 4GB RAM laptop.
# MLE is preferred over Haley-Knott regression if many genotypes are missing. 
################################################################################

# Each F2 population was selectively genotyped. F2 parents of progenies in the 
# two extreme decile tails of a population's 1st rep seed pro/oil distribution 
# were genotyped. In SG pops, the decile tails were selected from the extreme 
# 50% in each two-rep mean extreme top&bot quintile and thus were not truly the 
# 1-rep seed protein decile tails (because we wanted two-rep mean data - not 1- 
# rep data - for the purpose of choosing individuals for selective genotyping. 
# However, because ML-based interval analysis requires a phenotype for every 
# individual in the population, the below interval analyses had to be conducted 
# on the 1st Rep protein phenotypes, instead of the mean (last) pheno.col. 

# For interval analysis, the following cmd must be run first. For the step=n 
# option, n=2 (instead of a preferred n=1) was chosen to lessen computer memory 
# requirements relative to the R/qtl QTL analysis cmds that were used later. 
# Note that you must use n=2 throughout. Soybean SNP genotyping is quite error- 
# free, thus justifying (we think) our choice of a low error.prob value of 0.001. 

# NOTE: With a step=2, you may get pseudomarkers (i.e., cx.locxx) showing up in 
# place of makers as nearest to QTL peaks, since the genome is scanned for QTLs
# on a 2-cM grid basis and some marker-to-marker distances may be >4 cM. 
# If you do not want pseudomarkers in your scanone output, use step=0 instead.

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMANDS TO EXECUTE!

OBJNAME <- calc.genoprob(OBJNAME, map.function=c("kosambi"), step=2, error.prob=0.001)
summary(OBJNAME, threshold=3, df=TRUE)
jittermap(OBJNAME, amount=1e-6)

################################################################################

# First, do SEED PROTEIN (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmds for a GENOME SCAN and PLOT:
outpro.em <- scanone(OBJNAME, pheno.col=3, method="em")
summary(outpro.em)
summary(outpro.em, threshold=3) 
max(outpro.em)
plot(outpro.em, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein", ylab="LOD Score", ylim=c(0,5))
mtext("Interval Analysis (Maximum Likelihood Estimation Method)", side=3, line=1, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(3.98,3.98), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# Second, do SEED OIL (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmds for a GENOME SCAN and PLOT:
outoil.em <- scanone(OBJNAME, pheno.col=4, method="em")
summary(outoil.em)
summary(outoil.em, threshold=3)
max(outoil.em)
plot(outoil.em, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil", ylab="LOD Score", ylim=c(0,5))
mtext("Interval Analysis (Maximum Likelihood Estimation Method)", side=3, line=1, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(3.80,3.80), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

################################################################################
# INTERVAL ANALYSIS IMP (Using Multiple Imputation for Missing Genotype Data) 
# Requires about 30 seconds to run the cmds here on Jimbo's 4GB RAM laptop.
# Technically, multiple imputation is to be used when there are RANDOM missing 
# genotypes; however, selective genotyping has PURPOSEFUL missing genotypes! 
# Therefore, we conduct IM-IMP QTL analysis here ONLY for information purposes. 
################################################################################

# Note that in our populations, missing data arise in two ways:  (1) obviously, 
# we did not genotype the progenies occupying the central three quintile 
# fractions in F2 seed pro distributions, and (2) though our SNP genotyping 
# method is quite good, it is not perfect in terms of generating a genotype for 
# every genotyped individual - so there may be a few missing genotypes in one 
# or both extreme quintile fractions. On page 91 of the Broman & Sen book, it is 
# noted that multiple imputation "fills" in ALL of the missing data. With no 
# missing data, the fit of a single (or multiple) QTL model(s) to the phenotypic 
# data can be performed using standard ANOVA (or multiple regression) approaches. 
# However, as Broman & Sen note in their book, use of the multiple imputation 
# method requires extensive computation time and HUGE amounts of memory. That 
# said, the method is still of great value in terms of (later) searching for 
# multiple-QTL models that offer the best fits to the observed phenotypic data. 

# The following cmd is necessary before proceeding into multiple imputation. 
# In general, the larger the draw number the better (as is a smaller step size), 
# but your step size choice needs to be consistent throughout earlier and later 
# R/qtl cmds. Here, Jimbo used step=0 and n.draws=128 because Jimbo's laptop 
# did not have enough memory to do the cmd with n.draws=256. NOTE: 
# If the below cmd does not run, subsequent cmds will NOT run either. 
# Your computer's memory may necessitate fewer draws (try 64 if 128 fails). 

# NOTE: With a step=2, a pseudomarker name (i.e., cx.locxx) may appear in place 
# of a marker name as the "name" nearest to the QTL peak; Because the genome is 
# scanned for QTLs on a 2-cM grid basis, and some marker-to-marker distances of 
# >4 cM may be encountered, wherein the best QTL position may be between them. 
# If you do not want pseudomarkers in your scanone output, use step=0 instead,
# but note that you would not then be using stepped grid interval mapping!

date()

# BE PATIENT!  WAIT FOR THE BELOW COMMAND TO EXECUTE!

OBJNAME <- sim.geno(OBJNAME, step=2, n.draws=128, error.prob=0.001)

################################################################################

# First, do SEED PROTEIN (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmds for a GENOME SCAN and PLOT:
outpro.imp <- scanone(OBJNAME, pheno.col=3, method="imp")
summary(outpro.imp)
summary(outpro.imp, threshold=3)
max(outpro.imp)
plot(outpro.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein", ylab="LOD Score", ylim=c(0,3))
mtext("Interval Analysis (Multiple Imputation Method)", side=3, line=1, las=1)
mtext("(Method Imputes a Genotype (for a Missing One) Based on Observed Flanking Genotypes)", cex=0.65, side=3, line=0.25, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(2.70,2.70), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# Second, do SEED OIL (LOD Score of 3.0 used until permutation can be done). 

date()

# Use below cmds for a GENOME SCAN and PLOT:
outoil.imp <- scanone(OBJNAME, pheno.col=4, method="imp")
summary(outoil.imp)
summary(outoil.imp, threshold=3)
max(outoil.imp)
plot(outoil.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil", ylab="LOD Score", ylim=c(0,3))
mtext("Interval Analysis (Multiple Imputation Method)", side=3, line=1, las=1)
mtext("(Method Imputes a Genotype (for a Missing One) Based on Observed Flanking Genotypes)", cex=0.65, side=3, line=0.25, las=1)
abline(h=seq(0,10,0.5),lty=2)
abline(h=c(2.27,2.27), lty=1, lwd=2, col="blue")
abline(h=c(3,3), lty=1, lwd=2, col="red")

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

################################################################################

# NOW LET'S COMPARE THE MR, IM-em, & IM-imp METHODS RELATIVE TO QTL RESULTS 

date()

# Here we compare scanone results obtained with the mr, em, & imp methods, for 
# which an arbitrary LOD score of 3.0 was used as the significance threshold. 
# Permutation-derived LOD score thresholds will be derived in the next section. 

# SEED PROTEIN - All Chromosomes
  
plot(outpro.mr, outpro.em, outpro.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein", col=c("RED","BLACK","BLUE"), lty=c(2,1,3), ylab="LOD Score")
mtext("MR = Red   IM(em) = Black   IM(imp) = Blue)", side=3, line=1, las=1)
abline(h=seq(0.5,10.0,0.5),lty=2)

# SEED OIL - All Chromosomes

plot(outoil.mr, outoil.em, outoil.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil", col=c("RED","BLACK","BLUE"), lty=c(2,1,3), ylab="LOD Score")
mtext("MR = Red   IM(em) = Black   IM(imp) = Blue)", side=3, line=1, las=1)
abline(h=seq(0.5,10.0,0.5),lty=2)

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

################################################################################
################################################################################
# NOW THAT SEED PRO/OIL LOD SCANS ARE DONE, LET'S NEXT GET TO THE QTL EFFECTS. 
# STOP HERE TEMPORARILY - DATA GENERATED ABOVE IS NEEDED FOR THE BELOW COMMANDS. 
################################################################################
################################################################################

# USE THIS SECTION AND NEXT PRO/OIL ONES TO GET DATA NEEDED FOR THE MANUSCRIPT:

# P1022 FINAL SUMMARY of the population numbers, genotypes, chromosomes, etc:

summary(OBJNAME)
jittermap(OBJNAME, amount=1e-6)

# Next cmd indicates number of selectively genotyped individuals still in pop
# after all error checking, so that this info can be used for a MANUSCRIPT.

nxo <- countXO(OBJNAME) 
nxoFIN <- subset(OBJNAME, ind=nxo > 0)
nxoFIN

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# P1022 SEED PROTEIN QTLS - FINAL SUMMARY INFORMATION

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# P1022 SEED PROTEIN DATA- Used the scanone em (NOT mr or imp) OUTPUT data here. 

date()

# PRO Phenotype statistics were summarized above BEFORE error checking (go there) 
# but we RERUN those statistics below AFTER error checking (& Ind & Marker drops)

F23Pro1 <- OBJNAME$pheno[,3]
mean(F23Pro1, na.rm=TRUE)
var(F23Pro1, na.rm=TRUE)
sd(F23Pro1, na.rm=TRUE)
min(F23Pro1, na.rm=TRUE)
max(F23Pro1, na.rm=TRUE)
 
# P1022 FINAL summary of the PRO scanone-em results step=2 results:

summary(outpro.em)

# NOW let's create a summary table of PRO QTL pos lod, rsq, add dom eff, etc.  
# But first we need to use some Rqtl cmds to get what we need for that table:

prostep2out <- summary(outpro.em)

# Extract QTL map positions:
prostep2qpos <- prostep2out$pos
#prostep2qpos
prostep2qpos <- round(prostep2qpos,2)
#prostep2qpos

# Extract QTL marker or pseudomarker positions:
prostep2qpsm <- find.pseudomarker(OBJNAME, chr=c(1:20), pos=prostep2qpos)
#prostep2qpsm

# Extract QTL peak LOD scores: 
prostep2qlod <- prostep2out$lod
#prostep2qlod
prostep2qlod <- round(prostep2qlod,2)
#prostep2qlod

# Calculate a R2 (%) value (i.e., QTL heritability) for each QTL peak:
prostep2qrsq <- 1.0 - 10^(-(2/nind(OBJNAME)*prostep2qlod))
#prostep2qrsq
prostep2qrsq <- round(100*prostep2qrsq,2)
#prostep2qrsq

# Calculate add & dom effects for the QTL peak:
prostep2ADSE <- effectscan(OBJNAME, pheno.col=3, chr=c(1:20), get.se=TRUE, draw=FALSE)
prostep2EFF <- prostep2ADSE[c(prostep2qpsm), ]
#prostep2EFF

# Extract QTL peak Additive effects: 
prostep2qadd <- prostep2EFF$a
#prostep2qadd
prostep2qadd <- round(10*prostep2qadd,2)
#prostep2qadd

# Extract QTL peak Dominance effects: 
prostep2qdom <- prostep2EFF$d
#prostep2qdom
prostep2qdom <- round(10*prostep2qdom,2)
#prostep2qdom

# Identify markers on the left & right of, and nearest to, the QTL peak:
prostep2qlrc <- find.flanking(OBJNAME, chr=c(1:20), pos=prostep2qpos)
#prostep2qlrc

# Identify the nearest left flanking marker for each QTL Peak:
prostep2mlft <- prostep2qlrc$left
#prostep2mlft
prostep2lpos <- find.markerpos(OBJNAME, marker=prostep2mlft)
#prostep2lpos
prostep2lmar <- find.marker(OBJNAME, chr=c(1:20), pos=prostep2lpos$pos)
#prostep2lmar

# Identify the nearest right flanking marker for each QTL Peak:
prostep2mrgt <- prostep2qlrc$right
#prostep2mrgt
prostep2rpos <- find.markerpos(OBJNAME, marker=prostep2mrgt)
#prostep2rpos
prostep2rmar <- find.marker(OBJNAME, chr=c(1:20), pos=prostep2rpos$pos)
#prostep2rmar

# Identify the closest marker for each QTL Peak:
prostep2mclo <- prostep2qlrc$close
#prostep2mclo
prostep2cpos <- find.markerpos(OBJNAME, marker=prostep2mclo)
#prostep2cpos
prostep2cmar <- find.marker(OBJNAME, chr=c(1:20), pos=prostep2cpos$pos)
#prostep2cmar

#---------------

prostep2beg <- subset(prostep2out, select = -c(pos,lod))
#prostep2beg

# NOW CREATE A MANUSCRIPT TABULAR SUMMARY OF P1022 PRO QTL PEAK DATA.

prostep2tbl <- cbind(prostep2beg, cmar=prostep2cmar, qpos=prostep2qpos, qlod=prostep2qlod, qadd=prostep2qadd, qdom=prostep2qdom, qrsq=prostep2qrsq, lmar=prostep2lmar, lpos=prostep2lpos$pos, rpos=prostep2rpos$pos, rmar=prostep2rmar)
prostep2tbl

#---------------

# P1022 Chromosomal PRO QTLs - BAYES 95% CREDIBLE INTERVALS (BCI)

bayesint(outpro.em,  1, 0.95)
bayesint(outpro.em,  2, 0.95)
bayesint(outpro.em,  3, 0.95)
bayesint(outpro.em,  4, 0.95)
bayesint(outpro.em,  5, 0.95)
bayesint(outpro.em,  6, 0.95)
bayesint(outpro.em,  7, 0.95)
bayesint(outpro.em,  8, 0.95)
bayesint(outpro.em,  9, 0.95)
bayesint(outpro.em, 10, 0.95)
bayesint(outpro.em, 11, 0.95)
bayesint(outpro.em, 12, 0.95)
bayesint(outpro.em, 13, 0.95)
bayesint(outpro.em, 14, 0.95)
bayesint(outpro.em, 15, 0.95)
bayesint(outpro.em, 16, 0.95)
bayesint(outpro.em, 17, 0.95)
bayesint(outpro.em, 18, 0.95)
bayesint(outpro.em, 19, 0.95)
bayesint(outpro.em, 20, 0.95)
bayesint(outpro.em,  1, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  2, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  3, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  4, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  5, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  6, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  7, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  8, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em,  9, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 10, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 11, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 12, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 13, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 14, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 15, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 16, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 17, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 18, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 19, 0.95, expandtomarkers=TRUE)
bayesint(outpro.em, 20, 0.95, expandtomarkers=TRUE)

#-------------------------------------------------------------------------------

#< Jimbo: No matter if just one or the other Pro or Oil QTL exceeded a LOD > 3, 
# input the in-common chromosome numbers in the parentheses of the below cmds.  

# P1022 PRO QTL EFFECT SCANS

#< Jimbo:  Match chromosome numbers below to above numbers before running.
# Below, you will need to input chromosome numbers [ c() ] with significant QTLs.
effectscan(OBJNAME, pheno.col=3, chr=c(15,16,20), get.se=TRUE, d=TRUE, ylab="", mtick=c("line"))
mtext("P1022 F2.3 Seed Protein QTLs", side=3, line=2, las=1)
mtext("Additive & Dominance Effects", side=3, line=1, las=1)
mtext("[Estimated by Linear Regression of Phenotypes on Genotypes A,H,B (coded a = -1,0,+1 or d = 0,1,0)]", side=3, line=0.25, cex=0.65, las=1)
mtext("Seed Protein (%)", side=2, line=3.4, las=3)
mtext("(Only The Relevant Ones Displayed Here)", side=1, line=4, las=1)
abline(h=+0.5, lty="dotted")
abline(h=+0.25,lty="dotted")
abline(h=0.00, lty=1)
abline(h=-0.25,lty="dotted")
abline(h=-0.5, lty="dotted")

date()

#< Jimbo: Use BELOW (same above) chr nos but USE PRO (NOT OIL) QTL MAP POSITIONS. 
#  Also, change the default Q1*Q2 model parameter to have Q nos match chr nos!

#< Below cmd generates a list of chromosome no. and map position of PRO QTLs.

proqtl <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,57.70,33.21), what=c("prob"))
proqtl

#< Below cmds generate ANOVA and add/dom effect estimates with the hk method.
# NOTE: The hk method will overestimate the effects in SG populations.

proqtlprob <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,57.70,33.21), what=c("prob"))
anovaprohk <- fitqtl(OBJNAME, pheno.col=3, proqtlprob, covar=NULL, formula=y~Q1+Q2+Q3, method=c("hk"), model=c("normal"), dropone=TRUE, get.ests=TRUE, run.checks=TRUE, tol=1e-4, maxit=1000, forceXcovar=FALSE)
anovaprohk

#< Below cmds generate ANOVA and add/dom effect estimates with the imp method.
# NOTE: The imp method designed for random, not purposeful (SG), missing data.

proqtldraw <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,57.70,33.21), what=c("draws"))
anovaproimp <- fitqtl(OBJNAME, pheno.col=3, proqtldraw, covar=NULL, formula=y~Q1+Q2+Q3, method=c("imp"), model=c("normal"), dropone=TRUE, get.ests=TRUE, run.checks=TRUE, tol=1e-4, maxit=1000, forceXcovar=FALSE)
anovaproimp

date()

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# P1022 SEED OIL QTLS - FINAL SUMMARY INFORMATION

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# P1022 SEED OIL DATA- Used the scanone em (NOT mr or imp) OUTPUT data here. 

date()

# OIL Phenotype statistics were summarized above BEFORE error checking (go there) 
# but we RERUN those statistics below AFTER error checking (& Ind & Marker drops)

F23Oil1 <- OBJNAME$pheno[,4]
mean(F23Oil1, na.rm=TRUE)
var(F23Oil1, na.rm=TRUE)
sd(F23Oil1, na.rm=TRUE)
min(F23Oil1, na.rm=TRUE)
max(F23Oil1, na.rm=TRUE)
 
# P1022 FINAL summary of the oil scanone-em results step=2 results:

summary(outoil.em)

# NOW let's create a summary table of OIL QTL pos lod, rsq, add dom eff, etc.  
# But first we need to use some Rqtl cmds to get what we need for that table:

oilstep2out <- summary(outoil.em)

# Extract QTL map positions:
oilstep2qpos <- oilstep2out$pos
#oilstep2qpos
oilstep2qpos <- round(oilstep2qpos,2)
#oilstep2qpos

# Extract QTL marker or pseudomarker positions:
oilstep2qpsm <- find.pseudomarker(OBJNAME, chr=c(1:20), pos=oilstep2qpos)
#oilstep2qpsm

# Extract QTL peak LOD scores: 
oilstep2qlod <- oilstep2out$lod
#oilstep2qlod
oilstep2qlod <- round(oilstep2qlod,2)
#oilstep2qlod

# Calculate a R2 (%) value (i.e., QTL heritability) for each QTL peak:
oilstep2qrsq <- 1.0 - 10^(-(2/nind(OBJNAME)*oilstep2qlod))
#oilstep2qrsq
oilstep2qrsq <- round(100*oilstep2qrsq,2)
#oilstep2qrsq

# Calculate add & dom effects for the QTL peak:
oilstep2ADSE <- effectscan(OBJNAME, pheno.col=4, chr=c(1:20), get.se=TRUE, draw=FALSE)
oilstep2EFF <- oilstep2ADSE[c(oilstep2qpsm), ]
#oilstep2EFF

# Extract QTL peak Additive effects: 
oilstep2qadd <- oilstep2EFF$a
#oilstep2qadd
oilstep2qadd <- round(10*oilstep2qadd,2)
#oilstep2qadd

# Extract QTL peak Dominance effects: 
oilstep2qdom <- oilstep2EFF$d
#oilstep2qdom
oilstep2qdom <- round(10*oilstep2qdom,2)
#oilstep2qdom

# Identify markers on the left & right of, and nearest to, each QTL peak:
oilstep2qlrc <- find.flanking(OBJNAME, chr=c(1:20), pos=oilstep2qpos)
#oilstep2qlrc

# Identify the nearest left flanking marker for each QTL Peak:
oilstep2mlft <- oilstep2qlrc$left
#oilstep2mlft
oilstep2lpos <- find.markerpos(OBJNAME, marker=oilstep2mlft)
#oilstep2lpos
oilstep2lmar <- find.marker(OBJNAME, chr=c(1:20), pos=oilstep2lpos$pos)
#oilstep2lmar

# identify the nearest right flanking marker for each QTL Peak:
oilstep2mrgt <- oilstep2qlrc$right
#oilstep2mrgt
oilstep2rpos <- find.markerpos(OBJNAME, marker=oilstep2mrgt)
#oilstep2rpos
oilstep2rmar <- find.marker(OBJNAME, chr=c(1:20), pos=oilstep2rpos$pos)
#oilstep2rmar

# identify the closest marker for each QTL Peak:
oilstep2mclo <- oilstep2qlrc$close
#oilstep2mclo
oilstep2cpos <- find.markerpos(OBJNAME, marker=oilstep2mclo)
#oilstep2cpos
oilstep2cmar <- find.marker(OBJNAME, chr=c(1:20), pos=oilstep2cpos$pos)
#oilstep2cmar

#---------------

oilstep2beg <- subset(oilstep2out, select = -c(pos,lod))
#oilstep2beg

# NOW CREATE A MANUSCRIPT TABULAR SUMMARY OF P1022 OIL QTL PEAK DATA.

oilstep2tbl <- cbind(oilstep2beg, cmar=oilstep2cmar, qpos=oilstep2qpos, qlod=oilstep2qlod, qadd=oilstep2qadd, qdom=oilstep2qdom, qrsq=oilstep2qrsq, lmar=oilstep2lmar, lpos=oilstep2lpos$pos, rpos=oilstep2rpos$pos, rmar=oilstep2rmar)
oilstep2tbl

#---------------

# P1022 Chromosomal OIL QTLs - BAYES 95% CREDIBLE INTERVALS (BCI)

bayesint(outoil.em,  1, 0.95)
bayesint(outoil.em,  2, 0.95)
bayesint(outoil.em,  3, 0.95)
bayesint(outoil.em,  4, 0.95)
bayesint(outoil.em,  5, 0.95)
bayesint(outoil.em,  6, 0.95)
bayesint(outoil.em,  7, 0.95)
bayesint(outoil.em,  8, 0.95)
bayesint(outoil.em,  9, 0.95)
bayesint(outoil.em, 10, 0.95)
bayesint(outoil.em, 11, 0.95)
bayesint(outoil.em, 12, 0.95)
bayesint(outoil.em, 13, 0.95)
bayesint(outoil.em, 14, 0.95)
bayesint(outoil.em, 15, 0.95)
bayesint(outoil.em, 16, 0.95)
bayesint(outoil.em, 17, 0.95)
bayesint(outoil.em, 18, 0.95)
bayesint(outoil.em, 19, 0.95)
bayesint(outoil.em, 20, 0.95)
bayesint(outoil.em,  1, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  2, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  3, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  4, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  5, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  6, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  7, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  8, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em,  9, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 10, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 11, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 12, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 13, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 14, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 15, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 16, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 17, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 18, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 19, 0.95, expandtomarkers=TRUE)
bayesint(outoil.em, 20, 0.95, expandtomarkers=TRUE)

#-------------------------------------------------------------------------------

#< Jimbo: No matter if just one or the other Pro or Oil QTL exceeded a LOD > 3, 
# input the in-common chromosome numbers in the parentheses of the below cmds.  

# P1022 OIL QTL EFFECT SCANS

effectscan(OBJNAME, pheno.col=4, chr=c(15,16,20), get.se=TRUE, draw=TRUE, ylab="", mtick=c("line"))
mtext("P1022 F2.3 Seed Oil QTLs", side=3, line=2, las=1)
mtext("Additive & Dominance Effects", side=3, line=1, las=1)
mtext("[Estimated by Linear Regression of Phenotypes on Genotypes A,H,B (coded a = -1,0,+1 or d = 0,1,0)]", side=3, line=0.25, cex=0.65, las=1)
mtext("Seed Oil (%)", side=2, line=3.4, las=3)
mtext("(Only The Relevant Ones Displayed Here)", side=1, line=4, las=1)
abline(h=+0.5, lty="dotted")
abline(h=+0.25,lty="dotted")
abline(h=0.00, lty=1)
abline(h=-0.25,lty="dotted")
abline(h=-0.5, lty="dotted")

date()

#< Jimbo: Use BELOW (same above) chr nos but USE OIL (NOT PRO) QTL MAP POSITIONS. 
#  Also, change the default Q1*Q2 model parameter to have Q nos match chr nos!

#< Below cmd generates a list of chromosome no. and map position of OIL QTLs.

oilqtl <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,66.00,33.21), what=c("prob"))
oilqtl

#< Below cmds generate ANOVA and add/dom effect estimates with the hk method.
# NOTE: The hk method will overestimate the effects in SG populations.

oilqtlprob <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,66.00,33.21), what=c("prob"))
anovaoilhk <- fitqtl(OBJNAME, pheno.col=4, oilqtlprob, covar=NULL, formula=y~Q1+Q2+Q3, method=c("hk"), model=c("normal"), dropone=TRUE, get.ests=TRUE, run.checks=TRUE, tol=1e-4, maxit=1000, forceXcovar=FALSE)
anovaoilhk

#< Below cmds generate ANOVA and add/dom effect estimates with the imp method.
# NOTE: The imp method designed for random, not purposeful (SG), missing data.

oilqtldraw <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,66.00,33.21), what=c("draws"))
anovaoilimp <- fitqtl(OBJNAME, pheno.col=4, oilqtldraw, covar=NULL, formula=y~Q1+Q2+Q3, method=c("imp"), model=c("normal"), dropone=TRUE, get.ests=TRUE, run.checks=TRUE, tol=1e-4, maxit=1000, forceXcovar=FALSE)
anovaoilimp

date()

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# END OF SUMMARY SECTION FOR THE MANUSCRIPT

################################################################################

##### STOP & READ - STOP & READ - STOP & READ - STOP & READ - STOP & READ ######

################################################################################
################################################################################

# PERMUTATION R/qtl CODE HERE - DO PERMUTATION ONLY AS THE LAST ANALYSIS STEP ##

# BELOW PERMUTATION SECTION REQUIRES YOU TO EXAMINE F2.3 LOD SCAN OUTPUT ABOVE #

## To ensure a good lod scan graph layout, use below graph parameter cmd ### 

par(mar=c(5.1,5.1,5.1,3.1),font=2,font.main=2,font.sub=2,font.axis=2,font.lab=2)

################################################################################
########## TEMPORARILY STOP HERE AND READ THE TEXT BEFORE PROCEEDING ###########
# For 1900 permutations, scanone method=mr takes ~6 minutes, scanone method=em 
# takes ~50 mins, & scanone method=imp takes ~20 min (using Specht 4GB laptop w/
# 64bit-R). SO FOR PROTEIN AND OIL, ABOUT 2.5 HRS TOTAL RUNTIME MAY BE NEEDED. 
################################################################################

# PERMUTATION TESTS 

# A permutation test generates a pop-specific genome-wise LOD score significance 
# criterion for use in determining the statistical significance of a QTL peak, 
 
#? In selectively genotyped F2 populations, stratified permutation is required 
# (i.e., only the phenotypes and genotypes of the SG F2 plants are permuted. 

# The next cmd specifies the individuals to be put into the strat group, and 
# these are identified as those that have genotypes for more than X>0 markers. 
# Jimbo set the below threshold to 21 or more for selectively genotyped F2s:

date()

strat <- (ntyped(OBJNAME)>20)
strat

#? NOTE: Selecting a genome-wise alpha selection of P=0.05 is the proper choice, 
# but the number of permutations run for that selection will determine the std 
# dev for that selection. To achieve a +/- std dev of 0.005 for a P=0.05 mean, 
# you need to run 1900 permutations. See page 106 in the Broman and Sen book. 
# NOTE: Change set.seed(1) cmd to 2 or 3 etc. if you run the below cmds more than once. 
# Doing this, lets you compare a 1st & 2nd set of permutation-based LOD scores. 
 
date()

################################################################################

# FIRST, do 1900 permutations on SEED PROTEIN for each method (mr, em, imp). 

set.seed(5)

date()

# For method mr:  

opermpromr <- scanone(OBJNAME, pheno.col=3, method="mr", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermpromr, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=mr)")
mtext("Single Marker Analysis (Regression Method)", side=3, line=1, las=1)
summary(opermpromr, alpha=c(0.20,0.10,0.05))
summary(outpro.mr, perms=opermpromr, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value. 

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermpromr, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outpro.mr, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=mr)", ylab="LOD Score", ylim=c(0,5))
mtext("Single Marker Analysis (Regression Method)", side=3, line=1, las=1)
add.threshold(outpro.mr, perms=opermpromr, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.2, pos=4, offset=0.25, labels=thresh05)

date()

# For method em:  

opermproem <- scanone(OBJNAME, pheno.col=3, method="em", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermproem, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=em)")
mtext("Interval Analysis (Maximum Likelihood Estimation Method)", side=3, line=1, las=1)
summary(opermproem, alpha=c(0.20,0.10,0.05))
summary(outpro.em, perms=opermproem, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value. 

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermproem, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outpro.em, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=em)", ylab="LOD Score", ylim=c(0,5))
mtext("Interval Analysis (Maximum Likelihood Estimation Method)", side=3, line=1, las=1)
add.threshold(outpro.em, perms=opermproem, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.2, pos=4, offset=0.25, labels=thresh05)

date()

# For method imp:  

opermproimp <- scanone(OBJNAME, pheno.col=3, method="imp", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermproimp, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=imp)")
mtext("Interval Analysis (Multiple Imputation Method)", side=3, line=1, las=1)
summary(opermproimp, alpha=c(0.20,0.10,0.05))
summary(outpro.imp, perms=opermproimp, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value. 

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermproimp, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outpro.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Protein - 1900 Permutations (method=imp)", ylab="LOD Score", ylim=c(0,3))
mtext("Interval Analysis (Multiple Imputation Method)", side=3, line=1, las=1)
mtext("(Method Imputes a Genotype (for a Missing One) Based on Observed Flanking Genotypes)", cex=0.65, side=3, line=0.25, las=1)
add.threshold(outpro.imp, perms=opermproimp, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.1, pos=4, offset=0.25, labels=thresh05)

date()

################################################################################

# SECOND, do 1900 permutations on SEED OIL for each method (mr, em, imp). 

set.seed(5)

date()

# For method mr:  

opermoilmr <- scanone(OBJNAME, pheno.col=4, method="mr", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermoilmr, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=mr)")
summary(opermoilmr, alpha=c(0.20,0.10,0.05))
summary(outoil.mr, perms=opermoilmr, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value.

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermoilmr, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outoil.mr, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=mr)", ylab="LOD Score", ylim=c(0,5))
mtext("Single Marker Analysis (Regression Method)", side=3, line=1, las=1)
add.threshold(outoil.mr, perms=opermoilmr, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.2, pos=4, offset=0.25, labels=thresh05)

date()

# For method em:  

opermoilem <- scanone(OBJNAME, pheno.col=4, method="em", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermoilem, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=em)")
summary(opermoilem, alpha=c(0.20,0.10,0.05))
summary(outoil.em, perms=opermoilem, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value. 

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermoilem, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outoil.em, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=em)", ylab="LOD Score", ylim=c(0,5))
mtext("Interval Analysis (Maximum Likelihood Estimation Method)", side=3, line=1, las=1)
add.threshold(outoil.em, perms=opermoilem, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.2, pos=4, offset=0.25, labels=thresh05)

date()

# For method imp:  

opermoiimp <- scanone(OBJNAME, pheno.col=4, method="imp", n.perm=1900, perm.strata=strat, verbose=TRUE)
plot(opermoiimp, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=imp)")
summary(opermoiimp, alpha=c(0.20,0.10,0.05))
summary(outoil.imp, perms=opermoiimp, alpha=0.05, pvalues=TRUE)

# If a pvalue is near (rounded to) zero, this cmd finds its upper limit SE value. 

binom.test(0, 1900)$conf.int

# Now show the LOD Score scan plot with a permutation-specified LOD threshold. 
thresh05 <- summary(opermoiimp, alpha=0.05)
thresh05 <- round(thresh05[1], digits=2)
plot(outoil.imp, alternate.chrid=TRUE, main="P1022 F2.3 Seed Oil - 1900 Permutations (method=imp)", ylab="LOD Score", ylim=c(0,3))
mtext("Interval Analysis (Multiple Imputation Method)", side=3, line=1, las=1)
mtext("(Method Imputes a Genotype (for a Missing One) Based on Observed Flanking Genotypes)", cex=0.65, side=3, line=0.25, las=1)
add.threshold(outoil.imp, perms=opermoiimp, alpha=0.05, lty=2, abline(h=thresh05))
text(1, thresh05+0.1, pos=4, offset=0.25, labels=thresh05)

date()

################################################################################
################################################################################

# ESTIMATION OF SINGLE QTL EFFECTS - USING SCANONE EM RESULTS (RUN THAT FIRST).#

################################################################################
################################################################################

date()

#? The function actually needed for best estimation of QTL effects is fixqtl, 
# as R/qtl scanone does not provide estimated QTL effects. Broman notes in the 
# book (p. 125) that QTL effects are best estimated with a multiple QTL model. 
# However, in SG populations we do not have enough genotyping to use scantwo, so 
# with SG, we simply estimate QTL pos & add/dom effects based on scanone results.    

# Just in case, let's rerun the calc.genoprob cmd (needed for makeqtl cmd). 

OBJNAME <- calc.genoprob(OBJNAME, map.function=c("kosambi"), step=0, error.prob=0.001)

date()

################################################################################
# SEED PROTEIN QTLs 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# Repeat here the FINAL summary of the scanone-em step=2 results:

summary(outpro.em)

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

#< Jimbo: Enter CHR NOs and PRO QTL POSITIONS (from summary) into all below cmds. 

date()

#< Below cmd will find flanking (& nearest) (pseudo-)markers for the map positions.

find.flanking(OBJNAME, chr=c(15,16,20), pos=c(65.66,57.70,33.21))

#< Use below cmds to generate a list of Pro QTL positions & their add/dom effects.
# You will need to insert the chrs, positions, and marker names in all cmds.

proADSE <- effectscan(OBJNAME, pheno.col=3, chr=c(15,16,20), pos=c(65.66,57.70,33.21), get.se=TRUE, draw=FALSE)
proADSE[c("c15.loc60","S13574","S13844"), ]

#< Below cmd gives generic names to the putative QTLs. 

proqtl <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,57.70,33.21), what=c("prob"))
proqtl

#< Below cmds show a genetic map of just the chromosomes with Pro QTLs.

plot(proqtl, chr=c(15,16,20), col="red")
mtext(side=3, "F2 Population P1022", line=1.0)
mtext(side=3, "[P1022 GENETIC MAP - Seed Protein QTL Positions]", cex=0.65, line=0.2)

# Below cmds show the entire genetic map, along with the PRO QTL positions. 

plot.map(OBJNAME, shift=FALSE, alternate.chrid=TRUE, ylab="Map Distance (cM)")
mtext(side=3, "F2 Population P1022", line=1.0)
mtext(side=3, "[P1022 GENETIC MAP - Seed Protein QTL Positions]", cex=0.65, line=0.2)

# Below cmd prints open circles at the chromosome "begin" points. 
points(x,y, cex=0.75, pch=1)
# Below cmd prints chr leng cM labels below the chr "end" points. 
textxy(x1,z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0,0))
# Next cmds print the PRO QTL map positions and QTL type on the map. 
#& Jimbo:  Enter CHR NOs and PRO QTL POSITIONS (from above) into all below cmds. 
# NOTE: 2nd textxy y coordinates were upped by 3 digits to help align QTL text. 
points(15,65.66, cex=1.25, pch=23, col="red")
textxy(15,68.66, "Pro", cx = 0.7, dcol = "red", m = c(0,0))
points(16,57.70, cex=1.25, pch=23, col="red")
textxy(16,60.70, "Pro", cx = 0.7, dcol = "red", m = c(0,0))
points(20,33.21, cex=1.25, pch=23, col="red")
textxy(20,36.21, "Pro", cx = 0.7, dcol = "red", m = c(0,0))

date()

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# MODELING JUST THE TWO MAJOR SEED PROTEIN QTLs IN THIS POPULATION:
# Below cmds assign names to TWO MAJOR Marker-QTLs, for use in other below cmds. 

#& Jimbo:  Enter CHR NOs and PRO QTL POSITIONS (from above) into all below cmds. 

mnamea <- find.marker(OBJNAME,16,57.70)
mnamea
mnameb <- find.marker(OBJNAME,20,33.21)
mnameb

#& Jimbo:  Revise title Chromsome No in all below cmds in each below section 
# to match the above mname cmds. Questions? See Jimbo. 

# Command Set for Marker a 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=3, main="Effect Plot for the QTL on Chromosome 16", ylab="P1022 F2.3 Seed Protein", mname1=mnamea)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effpro1 <- effectplot(OBJNAME, pheno.col=3, ylab="P1022 F2.3 Seed Protein", mname1=mnamea, draw=FALSE)
effpro1
plot.pxg(OBJNAME, c(mnamea), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE", infer=FALSE)
mtext("[Selectively Genotyped Hi / Lo Decile F2 Plants Shown Here]", side=1, line=4, las=1)
mtext("Chromosome 16 Marker", cex=1.2, side=3, line=1.75, las=1)
plot.pxg(OBJNAME, c(mnamea), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE")
mtext("[Black = Hi / Lo Oil Decile Sel-Gen F2 Plants  -  Red = Non-SG F2 Plants (imputed geno)]", side=1, line=4, las=1)
mtext("Chromosome 16 Marker", cex=1.2, side=3, line=1.75, las=1)

# Command Set for Marker b 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=3, main="Effect Plot for the QTL on Chromosome 20", ylab="P1022 F2.3 Seed Protein", mname1=mnameb)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effpro2 <- effectplot(OBJNAME, pheno.col=3, ylab="P1022 Seed Protein Phenotype", mname1=mnameb, draw=FALSE)
effpro2
plot.pxg(OBJNAME, c(mnameb), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE", infer=FALSE)
mtext("[Selectively Genotyped Hi / Lo Decile F2 Plants Shown Here]", side=1, line=4, las=1)
mtext("Chromosome 20 Marker", cex=1.2, side=3, line=1.75, las=1)
plot.pxg(OBJNAME, c(mnameb), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE")
mtext("[Black = Hi / Lo Oil Decile Sel-Gen F2 Plants  -  Red = Non-SG F2 Plants (imputed geno)]", side=1, line=4, las=1)
mtext("Chromosome 20 Marker", cex=1.2, side=3, line=1.75, las=1)

# Command Set for Markers a & b 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=3, main="Effect Plot for the QTLs on Chromosome 16 & Chromosome 20", ylab="P1022 F2.3 Seed Protein", mname1=mnamea, mname2=mnameb)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effpro12 <- effectplot(OBJNAME, pheno.col=3, ylab="P1022 Seed Protein Phenotype", mname1=mnamea, mname2=mnameb, draw=FALSE)
effpro12
plot.pxg(OBJNAME, c(mnamea, mnameb), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE", infer=FALSE)
mtext("(Only Extreme Phenotypic Decile Genotypes Displayed Here)", side=1, line=4, las=1)
mtext("Chromosome 16 & Chromosome 20 Markers", cex=1.0, side=3, line=2.25, las=1)
plot.pxg(OBJNAME, c(mnamea, mnameb), pheno.col=3, ylab="P1022 F2.3 Seed Protein", col="BLUE")
mtext("(Black = Extreme Phenotypic Decile Genotypes  -  Red = Imputed Genotypes)", side=1, line=4, las=1)
mtext("Chromosome 16 & Chromosome 20 Markers", cex=1.0, side=3, line=2.25, las=1)


################################################################################
# SEED OIL QTLs 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# Repeat here the FINAL summary of the scanone-em step=2 results:

summary(outoil.em)

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

#< Jimbo:  Enter CHR NOs and OIL QTL POSITIONS (from above) into all below cmds. 

date()

#< Below cmd will find flanking (& nearest) (pseudo-)markers for the map positions.

find.flanking(OBJNAME, chr=c(15,16,20), pos=c(65.66,66.00,33.21))

#< Use below cmds to generate a list of Oil QTL positions & their add/dom effects.
# You will need to insert the chrs, positions, and marker names in all cmds.

oilADSE <- effectscan(OBJNAME, pheno.col=4, chr=c(15,16,20), pos=c(65.66,66.00,33.21), get.se=TRUE, draw=FALSE)
oilADSE[c("c15.loc60","S13908","S13844"), ]

#< Below cmd gives generic names to the putative QTLs. 

oilqtl <- makeqtl(OBJNAME, chr=c(15,16,20), pos=c(65.66,66.00,33.21), what=c("prob"))
oilqtl

#< Below cmds show a genetic map of just the chromosomes with Oil QTLs.

plot(oilqtl, chr=c(15,16,20), col="blue")
mtext(side=3, "F2 Population P1022", line=1.0)
mtext(side=3, "[P1022 GENETIC MAP - Seed Oil QTL Positions]", cex=0.65, line=0.2)

# Below cmds show the entire genetic map, along with the OIL QTL positions. 

plot.map(OBJNAME, shift=FALSE, alternate.chrid=TRUE, ylab="Map Distance (cM)")
mtext(side=3, "F2 Population P1022", line=1.0)
mtext(side=3, "[P1022 GENETIC MAP - Seed Oil QTL Positions]", cex=0.65, line=0.2)

# Below cmd prints open circles at the chromosome "begin" points. 
points(x,y, cex=0.75, pch=1)
# Below cmd prints chr leng cM labels below the chr "end" points. 
textxy(x1,z1, chrlenglab, cx = 0.7, dcol = "black", m = c(0,0))
# Next cmds print the OIL QTL map positions and QTL type on the map. 
#& Jimbo:  Enter CHR NOs and OIL QTL POSITIONS (from above) into all below cmds. 
# NOTE: 2nd textxy y coordinates were upped by 3 digits to help align QTL text. 
points(15,65.66, cex=1.25, pch=23, col="blue")
textxy(15,68.66, "Oil", cx = 0.7, dcol = "blue", m = c(0,0))
points(16,66.00, cex=1.25, pch=23, col="blue")
textxy(16,69.00, "Oil", cx = 0.7, dcol = "blue", m = c(0,0))
points(20,33.21, cex=1.25, pch=23, col="blue")
textxy(20,36.21, "Oil", cx = 0.7, dcol = "blue", m = c(0,0))

date()

#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

# MODELING JUST THE TWO MAJOR SEED OIL QTLs IN THIS POPULATION:
# Below cmds assign names to TWO MAJOR Marker-QTLs, for use in other below cmds. 

#& Jimbo:  Enter CHR NOs and OIL QTL POSITIONS (from above) into all below cmds. 

mnamea <- find.marker(OBJNAME,16,66.00)
mnamea
mnameb <- find.marker(OBJNAME,20,33.21)
mnameb

#& Jimbo:  Revise title Chromsome No in all below cmds in each below section 
# to match the above mname cmds. Questions? See Jimbo. 

# Command Set for Marker a 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=4, main="Effect Plot for the QTL on Chromosome 16", ylab="P1022 F2.3 Seed Oil", mname1=mnamea)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effoil1 <- effectplot(OBJNAME, pheno.col=4, ylab="P1022 F2.3 Seed Oil", mname1=mnamea, draw=FALSE)
effoil1
plot.pxg(OBJNAME, c(mnamea), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE", infer=FALSE)
mtext("[Selectively Genotyped Hi / Lo Decile F2 Plants Shown Here]", side=1, line=4, las=1)
mtext("Chromosome 16 Marker", cex=1.2, side=3, line=1.75, las=1)
plot.pxg(OBJNAME, c(mnamea), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE")
mtext("[Black = Hi / Lo Oil Decile Sel-Gen F2 Plants  -  Red = Non-SG F2 Plants (imputed geno)]", side=1, line=4, las=1)
mtext("Chromosome 16 Marker", cex=1.2, side=3, line=1.75, las=1)

# Command Set for Marker b 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=4, main="Effect Plot for the QTL on Chromosome 20", ylab="P1022 F2.3 Seed Oil", mname1=mnameb)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effoil2 <- effectplot(OBJNAME, pheno.col=4, ylab="P1022 Seed Oil Phenotype", mname1=mnameb, draw=FALSE)
effoil2
plot.pxg(OBJNAME, c(mnameb), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE", infer=FALSE)
mtext("[Selectively Genotyped 45 Hi / 45 Lo Decile F2 Plants Shown Here]", side=1, line=4, las=1)
mtext("Chromosome 20 Marker", cex=1.2, side=3, line=1.75, las=1)
plot.pxg(OBJNAME, c(mnameb), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE")
mtext("[Black = Hi / Lo Oil Decile Sel-Gen F2 Plants  -  Red = Non-SG F2 Plants (imputed geno)]", side=1, line=4, las=1)
mtext("Chromosome 20 Marker", cex=1.2, side=3, line=1.75, las=1)

# Command Set for Markers a & b 
#@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

date()

effectplot(OBJNAME, pheno.col=4, main="Effect Plot for the QTLs on Chromosome 16 & Chromosome 20", ylab="P1022 F2.3 Seed Oil", mname1=mnamea, mname2=mnameb)
mtext("Means and Standard Errors for the Marker Genotypes", cex=1.2, side=3, line=0.5, las=1)
effoil12 <- effectplot(OBJNAME, pheno.col=4, ylab="P1022 Seed Oil Phenotype", mname1=mnamea, mname2=mnameb, draw=FALSE)
effoil12
plot.pxg(OBJNAME, c(mnamea, mnameb), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE", infer=FALSE)
mtext("(Only Extreme Phenotypic Decile Genotypes Displayed Here)", side=1, line=4, las=1)
mtext("Chromosome 16 & Chromosome 20 Markers", cex=1.0, side=3, line=2.25, las=1)
plot.pxg(OBJNAME, c(mnamea, mnameb), pheno.col=4, ylab="P1022 F2.3 Seed Oil", col="BLUE")
mtext("(Black = Extreme Phenotypic Decile Genotypes  -  Red = Imputed Genotypes)", side=1, line=4, las=1)
mtext("Chromosome 16 & Chromosome 20 Markers", cex=1.0, side=3, line=2.25, las=1)


################################################################################
################################################################################

#NOTE: ALL NEBRASKA SOYBEANERS STOP HERE !  DO NOT GO ANY FARTHER DOWN !!!!!!!! 

date()
#save.image()

################################################################################
################################################################################

# STOP - You need to know what you are doing before entering into next section #


