Skip to content

Conversation

@jb621-star
Copy link

These changes include the modifications to the math inside mvlmm, and attempts to add an option for the user to input their own strain-specific residual variance column within the files gemma_io.cpp and param.cpp. Thanks for your help!

jb621-star and others added 20 commits March 20, 2023 09:52
Small formatting change to -residvar
Updated resid_var math to CalcQi
Updated most functions except MphEM and remaining P-val functions
Continued to make appropriate math changes to MphCalcSigma, UpdateV, MphCalcLogL, MphEM, MphCalcP, MphCalcBeta, and other functions.
Finished theoretical changes in other functions and replaced all "eps_eval" entries with "eval_vec, sigmasq".
Replaced any lingering mentions of "eps_eval" with "eval_vec, sigmasq".
Changed mentions of "eps_eval" with "sigmasq". Still need to figure out how to get eigenvectors from the same place "eval" is derived.
Changed mention of eps_eval as a vector to "gsl_matrix *sigmasq"
Changed the one mention of "eps_eval" to "eval_vec" and "sigmasq". May actually need to return to this file in order for the code to actually run.
eps_eval --> eval_vec, sigmasq
Changed eps_eval to sigmasq and corrected any identified regions where dimensions did not match.
Replaced input of CopyResid from eps_eval to sigmasq
Copy link
Contributor

@aartaka aartaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I've noticed. Most of the errors you're getting are due to mismatch between the arguments function requires and the arguments you provide, so ensure these are in sync. The rest I've tried to locate and comment on. Let's have a call (coordinate via Matrix) in case there's something I missed.

src/param.h Outdated
string file_anno; // Optional.
string file_gxe; // Optional.
string file_cvt; // Optional.
string file_residvar; // Optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice you called it file_residvar here, but are using it as file_resid elsewhere. A typo?

src/gemma.cpp Outdated
// PLINK analysis
if (cPar.file_gxe.empty()) {
cLmm.AnalyzePlink(U, eval, UtW, &UtY_col.vector, W,
cLmm.AnalyzePlink(U, eval, eval_vec, sigmasq, UtW, &UtY_col.vector, W,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't define eval_vec anywhere earlier, that's why compiler is shouting. What is the intended use for eval_vec?

Also note that sigmasq is not defined either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then there's also required/provided argument mismatch. AnalyzePlink requires (U, eval, UtW, Uty, W, y, gwasnps), while you provide (U, eval, eval_vec, sigmasq, UtW, Uty, W, y, gwasnps). I guess AnalyzePlink needs to be modified to accept new arguments first.

src/mvlmm.h Outdated

string file_bfile;
string file_geno;
//string file_residvar maybe?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you need to use it (although you seems to only use residuals as sigmasq, so maybe no need for it.) And don't forget to add a line assigning it to CopyFromParam in case you add this field.

Changed "file_residvar" back to "file_resid"
Transferred all instances of "eval_vec" to "U".
Changed all instances of "_residvar" to "resid" to keep things consistent.
Changed all instances of "residvar" to "resid" to keep things consistent
Changed inputs to "CalcLmmVgVeBeta", "CalcLambda", and "AnalyzePlink" according to their use elsewhere (i.e. no more "eval_vec").
Changed residvar to resid for consistency.
Changed all mentions of residvar to resid for consistency
Changed all residvar to resid for consistency.
Changed "residvar" to "resid" and removed unnecessary mentions of "eval_vec" from other functions.
Redefined ReadFile_resid() to expect a gsl_matrix instead of <vector<vector<double>> for resid. This seems to be a consistent sticking point.
Changed ReadFile_resid so that gsl_matrix operations are being used for resid instead of vector<vector<double>> operations.
Threw out the following additional definition of ReadFile_resid: 
void ReadFile_resid(const string &file_resid, bool &error, gsl_matrix *sigmasq) {
  debug_msg("entered");
  igzstream infile(file_resid.c_str(), igzstream::in);
  if (!infile) {
    cout << "error! fail to open the residual variance file: " << file_resid << endl;
    error = true;
    return;
  }

  size_t n_row = sigmasq->size1, n_col = sigmasq->size2, i_row = 0, i_col = 0;

  gsl_matrix_set_zero(sigmasq); //change this so that sigmasq = V_e somehow

  string line;
  char *ch_ptr;
  double d;

  while (getline(infile, line)) {
    if (i_row == n_row) {
      cout << "error! number of rows in the residual variance file is larger "
           << "than expected." << endl;
      error = true;
    }

    i_col = 0;
    ch_ptr = strtok((char *)line.c_str(), " ,\t");
    while (ch_ptr != NULL) {
      if (i_col == n_col) {
        cout << "error! number of columns in the residual variance file "
             << "is larger than expected, for row = " << i_row << endl;
        error = true;
      }

      d = atof(ch_ptr);
      gsl_matrix_set(sigmasq, i_row, i_col, d);
      i_col++;

      ch_ptr = strtok(NULL, " ,\t");
    }

    i_row++;
  }

  infile.close();
  infile.clear();

  return;
}
Added additional ReadFile_resid call back, changing mentions of sigmasq back to resid.
Edited user documentation for "-resid" option.
Changed mention of "resid" to "sigmasq" within the function "CheckResid"
Created new version of "CheckResid" function that hopefully helps in debugging the empty "sigmasq" issue and commented out seemingly redundant "CheckResid" function initiation.
Restored ReadFile_resid to its original, because the other caused compile errors.
Copilot suggested a helper function for creating Sigma.
Suggested change to PARAM::CheckResid() according to Copilot.
Reinitialized Sigma in scope.
Modified some of the options here hoping that it would help me debug things a little better.
Updated PARAM::CheckResid to do a proper check in the case that n_resid is empty.
Added an additional debugging step for if the residual variance matrix is not initialized.
Corrected the CalculateSigma helper function for calculating the modified residual variance data structure.
Added various bug fixes with residual variance variables and checks to PARAM::CheckResid, PARAM::ReadFiles, and PARAM constructor.
Made sure to include Ve_null initiation.
Corrected subsequent function declarations of CalculateSigma.
Ve_null is never used, so removing from null initiation.
Removing Ve_null from initiation.
Setting resid to 1 if user doesn't supply residual variance file.
Attempted to address 'epsilon' warning by initializing it with all the other element-wise data structures.
Reordered PARAM::PARAM(void) parameters according to compiler warning messages.
Reordered PARAM terms according to compiler warnings.
Removed redeclaration of 'size_t PARAM::n_vc'
Adjustment to Matrix Copy Logic
Implemented fixes to CopyResid
Implemented warning to see if ni_test isn't set when CheckResid is run.
Called CheckResid AFTER ProcessCvtPhen() is called so that ni_test is already defined for default residal behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants