Skip to content

Outgroup v.s. ingroup resolution #209

@RussellGarwood

Description

@RussellGarwood

Hi Martin - in my teaching light term, I have returned to trying to do some phylogenies, and noticed that in even v simple analyses, which I think should have strong ingroup support, when using the inapplicables search, I seem to be betting a basal polytomy between the outgroup and the earliest branching ingroup when I use maximiseParsimony(). To test this, I created the following nexus file, which should have unambiguous ingroup support:

#NEXUS	
BEGIN DATA;		
DIMENSIONS	NTAX = 5 NCHAR = 10
FORMAT DATATYPE = STANDARD GAP = - MISSING = ? SYMBOLS = "0 1 2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T U V W X Y Z a b c d e f g h j k m n p q r s";																																																																																																																																											

MATRIX																																																																																																																																											
Taxon_01 0000000000
Taxon_02 1111101111
Taxon_03 1112111111
Taxon_04 1122111111
Taxon_05 1122111111
;																																																																																																																																											
END;																																																																																																																																																																																																																																																																																					

I then simplified a script to do a search, that I wrote last summer, to do fitch and inapplicables aware search on this, assuming the above is saved as a file ccalled test.nex in the working directory:


#Install packages if required
packages <- c("parallel", "TreeTools", "TreeDist", "TreeSearch", "Rogue", "cheapr", "paleobioDB", "strap")
if(length(packages[!packages %in% installed.packages()[,"Package"]]) > 0){
  install.packages(packages[!packages %in% installed.packages()[,"Package"]])
}
# Load required libraries
invisible(lapply(packages, library, character.only = TRUE))
rm(packages)

#This is required if not running from Bash
#setwd("~/Desktop/Programming/scripts/Phylogeny/R_parsimony")
#This is used to write relevant output files, in addition to reading the data in
inputDataNexusFile<-"./test.nex"
#Set output directory for plots (all other files written to wd)
outDirectory<-getwd()

#Number of cores for parallelisation
numCores <- detectCores()

##################################################################################
#### Source functions
doSearch <- function(phyMatrix, stub, k, outDirectory) {
  if (k == Inf) {
    title <- paste("Equal weights - ", stub)
  } else {
    title <- paste("Implied weights - K", k, " ", stub, sep = "")
  }
  
  # Select starting tree
  startTree <- AdditionTree(phyMatrix, concavity = k)
  
  # Search for optimal trees under equal weights
  newTrees <- MaximizeParsimony(
    phyMatrix,
    tree = startTree,
    concavity = k,
    ratchIter = 12,
    tbrIter = 1,
    maxHits = 100,
    maxTime = 60,
    startIter = 3,
    finalIter = 1,
    verbosity = 4
  )
  cat(title, file = firstHitsFile, append = TRUE)
  cat(attr(newTrees, "firstHit"), file = firstHitsFile, append = TRUE)
  cat("\n", file = firstHitsFile, append = TRUE)
  
  # Calculate rogues
  rogues <- Rogue::QuickRogue(newTrees, fullSeq = TRUE, p = 1)
  cat("Rogues are: ")
  print(rogues) # Detailed results of rogue analysis
  
  # Calculate consensus
  conTree <- Consensus(newTrees, p = 1.0)
  # Root consensus
  rootedConTree <- root(conTree, outgroup = outgroupForRooting)
  cat("Outgroup is ",outgroupForRooting)
  # Adjust parameters here
  # nReplicates <- 200
  # jackTrees <- lapply(logical(nReplicates), function(x) {
  #   Resample(
  #     phyMatrix,
  #     newTrees,
  #     ratchIter = 12,
  #     tbrIter = 1,
  #     maxHits = 100,
  #     maxTime = 20,
  #     startIter = 3,
  #     finalIter = 1,
  #     verbosity = 4
  #   )
  # })
  
  # Save tree
  fileName <- gsub(" ", "_", title)
  
  saveDirectory <- paste(outDirectory, "/", fileName, ".pdf", sep = "")
  pdf(saveDirectory, width = 10, height = 20)
  # Set up plotting area - reduce font size
  par(cex = 0.8)
  # Plot the tree
  # JackLabels(rootedConTree, lapply(jackTrees, ape::consensus), add=TRUE, pos=1, adj=-10)
  
  plot(rootedConTree, tip.color = ColByStability(newTrees))
  #nodelabels(JackLabels(rootedConTree, lapply(jackTrees, ape::consensus), plot = FALSE), frame = "none", pos = 4, offset = 0.1, cex = 0.6, col = "darkgray")
  # Trying to colour nodes by support:
  # nodelabels(thermo=as.numeric(JackLabels(rootedConTree, lapply(jackTrees, ape::consensus), plot=FALSE)), frame="circle", pos=4, offset = 0.1, cex =0.6)
  #ggtree(rootedConTree) + geom_tiplab(as_ylab = TRUE, color = ColByStability(newTrees))
  
  title(title)
  PlotTools::SpectrumLegend("bottomright", legend = c("Stable", "Unstable"), palette = hcl.colors(131, "inferno")[1:101])
  dev.off()
  
  # Remove data no longer required
  rm(startTree, newTrees, rogues, conTree, rootedConTree)
}

makeFitchMatrix <- function(phyloMatrixAsPhyDat){
  datasetFitchMatrix<-PhyDatToMatrix(phyloMatrixAsPhyDat)
  datasetFitchMatrix[datasetFitchMatrix == "-"] <- "?"
  return(MatrixToPhyDat(datasetFitchMatrix))
}

##################################################################################
##### Load data, create matrices for inapplicable aware and fitch parsimony

#Store datasets to run in a list
datasets<-list()

#Load data from matrix exported as TSV nexus format
datasetInapplicables <- ReadAsPhyDat(inputDataNexusFile)

#Add this dataset to our list of datasets
datasets[["datasetInapplicables"]]<-datasetInapplicables
rm(datasetInapplicables)

#Use this to set the outgroup now the names are changed
outgroupForRooting<-names(datasets[["datasetInapplicables"]])[1]

#Create a version of this to do Fitch treatment of inapplicables 
datasets[["datasetFitch"]]<-makeFitchMatrix(datasets[["datasetInapplicables"]])

#Lets record the first hit attribute so we can check we are likely to be exploring treespace OK 
firstHitsFile<-paste(outDirectory,"/search_firstHits.txt",sep="")

mclapply(1:length(datasets), mc.cores = numCores-2, function(i)
  #for(i in 1:length(datasets))
{
  #Work out name stub  for this treatment
  stub<-strsplit(names(datasets)[i],split='dataset')[[1]][2]
  
  ### Equal weights ### 
  doSearch(datasets[[i]],stub,Inf, outDirectory)
  
  ### Implied  weights - K of 10 ###
  doSearch(datasets[[i]],stub,10, outDirectory)
  
  ### Implied  weights - K of 3 ###
  doSearch(datasets[[i]],stub,3, outDirectory)
})

Outputs are:

Equal_weights_-__Fitch.pdf
Equal_weights_-__Inapplicables.pdf
Implied_weights_-_K10_Fitch.pdf
Implied_weights_-_K10_Inapplicables.pdf
Implied_weights_-_K3_Fitch.pdf
Implied_weights_-_K3_Inapplicables.pdf

As you can see, with the inapplicables aware search, we do indeed always get a basal polytomy between the out and earliest branching ingroup. I am very aware I might be missing something obvious (or subtle), and if so apologies, but I wanted to flag it and find out if this is the expected behaviour, or whether there something untoward going on?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions