-
Notifications
You must be signed in to change notification settings - Fork 12
Expand file tree
/
Copy path01_data-loading_TCGA-BIC.html
More file actions
1167 lines (1078 loc) · 95.3 KB
/
01_data-loading_TCGA-BIC.html
File metadata and controls
1167 lines (1078 loc) · 95.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<meta name="author" content="Jacques van Helden" />
<meta name="date" content="2021-03-29" />
<title>Tutorial: machine-learning with TGCA BIC transcriptome</title>
<script src="01_data-loading_TCGA-BIC_files/header-attrs-2.7/header-attrs.js"></script>
<script src="01_data-loading_TCGA-BIC_files/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="01_data-loading_TCGA-BIC_files/bootstrap-3.3.5/css/cerulean.min.css" rel="stylesheet" />
<script src="01_data-loading_TCGA-BIC_files/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="01_data-loading_TCGA-BIC_files/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="01_data-loading_TCGA-BIC_files/bootstrap-3.3.5/shim/respond.min.js"></script>
<style>h1 {font-size: 34px;}
h1.title {font-size: 38px;}
h2 {font-size: 30px;}
h3 {font-size: 24px;}
h4 {font-size: 18px;}
h5 {font-size: 16px;}
h6 {font-size: 12px;}
code {color: inherit; background-color: rgba(0, 0, 0, 0.04);}
pre:not([class]) { background-color: white }</style>
<script src="01_data-loading_TCGA-BIC_files/jqueryui-1.11.4/jquery-ui.min.js"></script>
<link href="01_data-loading_TCGA-BIC_files/tocify-1.9.1/jquery.tocify.css" rel="stylesheet" />
<script src="01_data-loading_TCGA-BIC_files/tocify-1.9.1/jquery.tocify.js"></script>
<script src="01_data-loading_TCGA-BIC_files/navigation-1.1/tabsets.js"></script>
<script src="01_data-loading_TCGA-BIC_files/navigation-1.1/codefolding.js"></script>
<style type="text/css">
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<style type="text/css">
code {
white-space: pre;
}
.sourceCode {
overflow: visible;
}
</style>
<style type="text/css" data-origin="pandoc">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
}
pre.numberSource { margin-left: 3em; padding-left: 4px; }
div.sourceCode
{ color: #cccccc; background-color: #303030; }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ffcfaf; } /* Alert */
code span.an { color: #7f9f7f; font-weight: bold; } /* Annotation */
code span.at { } /* Attribute */
code span.bn { color: #dca3a3; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #f0dfaf; } /* ControlFlow */
code span.ch { color: #dca3a3; } /* Char */
code span.cn { color: #dca3a3; font-weight: bold; } /* Constant */
code span.co { color: #7f9f7f; } /* Comment */
code span.cv { color: #7f9f7f; font-weight: bold; } /* CommentVar */
code span.do { color: #7f9f7f; } /* Documentation */
code span.dt { color: #dfdfbf; } /* DataType */
code span.dv { color: #dcdccc; } /* DecVal */
code span.er { color: #c3bf9f; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #c0bed1; } /* Float */
code span.fu { color: #efef8f; } /* Function */
code span.im { } /* Import */
code span.in { color: #7f9f7f; font-weight: bold; } /* Information */
code span.kw { color: #f0dfaf; } /* Keyword */
code span.op { color: #f0efd0; } /* Operator */
code span.ot { color: #efef8f; } /* Other */
code span.pp { color: #ffcfaf; font-weight: bold; } /* Preprocessor */
code span.sc { color: #dca3a3; } /* SpecialChar */
code span.ss { color: #cc9393; } /* SpecialString */
code span.st { color: #cc9393; } /* String */
code span.va { } /* Variable */
code span.vs { color: #cc9393; } /* VerbatimString */
code span.wa { color: #7f9f7f; font-weight: bold; } /* Warning */
.sourceCode .row {
width: 100%;
}
.sourceCode {
overflow-x: auto;
}
.code-folding-btn {
margin-right: -30px;
}
</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
var sheets = document.styleSheets;
for (var i = 0; i < sheets.length; i++) {
if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
try { var rules = sheets[i].cssRules; } catch (e) { continue; }
for (var j = 0; j < rules.length; j++) {
var rule = rules[j];
// check if there is a div.sourceCode rule
if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") continue;
var style = rule.style.cssText;
// check if color or background-color is set
if (rule.style.color === '' && rule.style.backgroundColor === '') continue;
// replace div.sourceCode by a pre.sourceCode rule
sheets[i].deleteRule(j);
sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
}
}
})();
</script>
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
pre code {
padding: 0;
}
</style>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
<style type="text/css">
.code-folding-btn { margin-bottom: 4px; }
</style>
<style type="text/css">
#TOC {
margin: 25px 0px 20px 0px;
}
@media (max-width: 768px) {
#TOC {
position: relative;
width: 100%;
}
}
@media print {
.toc-content {
/* see https://github.com/w3c/csswg-drafts/issues/4434 */
float: right;
}
}
.toc-content {
padding-left: 30px;
padding-right: 40px;
}
div.main-container {
max-width: 1200px;
}
div.tocify {
width: 20%;
max-width: 260px;
max-height: 85%;
}
@media (min-width: 768px) and (max-width: 991px) {
div.tocify {
width: 25%;
}
}
@media (max-width: 767px) {
div.tocify {
width: 100%;
max-width: none;
}
}
.tocify ul, .tocify li {
line-height: 20px;
}
.tocify-subheader .tocify-item {
font-size: 0.90em;
}
.tocify .list-group-item {
border-radius: 0px;
}
</style>
</head>
<body>
<div class="container-fluid main-container">
<!-- setup 3col/9col grid for toc_float and main content -->
<div class="row">
<div class="col-sm-12 col-md-4 col-lg-3">
<div id="TOC" class="tocify">
</div>
</div>
<div class="toc-content col-sm-12 col-md-8 col-lg-9">
<div id="header">
<div class="btn-group pull-right float-right">
<button type="button" class="btn btn-default btn-xs btn-secondary btn-sm dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"><span>Code</span> <span class="caret"></span></button>
<ul class="dropdown-menu dropdown-menu-right" style="min-width: 50px;">
<li><a id="rmd-show-all-code" href="#">Show All Code</a></li>
<li><a id="rmd-hide-all-code" href="#">Hide All Code</a></li>
</ul>
</div>
<h1 class="title toc-ignore">Tutorial: machine-learning with TGCA BIC transcriptome</h1>
<h3 class="subtitle">01. Data loading</h3>
<h4 class="author">Jacques van Helden</h4>
<h4 class="date">2021-03-29</h4>
</div>
<div id="downloading-and-loading-the-data-and-metadata-files" class="section level2">
<h2>Downloading and loading the data and metadata files</h2>
<p>The preprocessed datasets are available on the course github repository.</p>
<div id="a-convenient-function-to-download-files-only-once" class="section level3">
<h3>A convenient function to download files only once</h3>
<p>We will use the function <code>download_only_once()</code> that we defined in a previous course on data exploration.</p>
<p>This function takes as input a base URL, a file name and a local folder.</p>
<ul>
<li>It checks if the local folder already exists, and if not creates it.</li>
<li>It then checks if the file is already present in this local folder, and if not downloads it.</li>
</ul>
<p>This facilitates the downloading of the different files required for the practical. Open the code box below and run the code in your R console.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">#' @title Download a file only if it is not yet here</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co">#' @author Jacques van Helden email{Jacques.van-Helden@@france-bioinformatique.fr}</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param url_base base of the URL, that will be prepended to the file name</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param file_name name of the file (should not contain any path)</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="co">#' @param local_folder path of a local folder where the file should be stored</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co">#' @return the function returns the path of the local file, built from local_folder and file_name</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="co">#' @export©</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>download_only_once <span class="ot"><-</span> <span class="cf">function</span>(</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> url_base, </span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> file_name,</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> local_folder) {</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="do">## Define the source URL </span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> url <span class="ot"><-</span> <span class="fu">file.path</span>(url_base, file_name)</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">message</span>(<span class="st">"Source URL</span><span class="sc">\n\t</span><span class="st">"</span>, url)</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="do">## Define the local file</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> local_file <span class="ot"><-</span> <span class="fu">file.path</span>(local_folder, file_name)</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="do">## Create the local data folder if it does not exist</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="fu">dir.create</span>(local_folder, <span class="at">showWarnings =</span> <span class="cn">FALSE</span>, <span class="at">recursive =</span> <span class="cn">TRUE</span>)</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a> <span class="do">## Download the file ONLY if it is not already there</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (<span class="sc">!</span><span class="fu">file.exists</span>(local_file)) {</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">message</span>(<span class="st">"Downloading file from source URL to local file</span><span class="sc">\n\t</span><span class="st">"</span>, </span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a> local_file)</span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">download.file</span>(<span class="at">url =</span> url, <span class="at">destfile =</span> local_file)</span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> } <span class="cf">else</span> {</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> <span class="fu">message</span>(<span class="st">"Local file already exists, no need to download</span><span class="sc">\n\t</span><span class="st">"</span>, </span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> local_file)</span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span>(local_file)</span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</div>
<div id="exercise-data-download" class="section level3">
<h3>Exercise: data download</h3>
<ol style="list-style-type: decimal">
<li>Use the function <code>download_only_once()</code> to download the BIC transciptome data and metadata from the github web site and store it in a local folder (for example <code>~/m3-stat-R/TCGA-BIC_analysis</code>).</li>
</ol>
<p>Base URL: <a href="https://github.com/DU-Bii/module-3-Stat-R/raw/master/stat-R_2021/data/TCGA_BIC_subset/" class="uri">https://github.com/DU-Bii/module-3-Stat-R/raw/master/stat-R_2021/data/TCGA_BIC_subset/</a></p>
<p>Files :</p>
<ul>
<li><p>Expression table (1000 top-ranking genes from differential analysis):</p>
<ul>
<li><p>File name: <code>BIC_log2-norm-counts_edgeR_DEG_top_1000.tsv.gz</code></p></li>
<li><p>This file contains log2-transformed and standardised counts, with 1000 genes (rows) x 819 samples (columns)</p></li>
</ul></li>
<li><p>Metadata:</p>
<ul>
<li><p>File name: <code>BIC_sample-classes.tsv.gz</code></p></li>
<li><p>This file indicates the status of the 3 marker genes traditionnally used to diagnose the breast cancer type, as well as the cancer class derived from these 3 markers</p></li>
<li><p>For more information, see the pre-processing report: <a href="https://du-bii.github.io/study-cases/Homo_sapiens/TCGA_study-case/import_TCGA_from_Recount.html">import_TCGA_from_Recount.html</a></p></li>
</ul></li>
</ul>
<ol start="2" style="list-style-type: decimal">
<li>After having downloaded these files, load them in variables having the following names (for the sake of consistency with the course material).</li>
</ol>
<ul>
<li>Expression table: bic_expr</li>
<li>Metadata: bic_meta</li>
</ul>
</div>
<div id="solution-data-download-and-load" class="section level3">
<h3>Solution: data download and load</h3>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Define the remote URL and local folder</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>bic_url <span class="ot"><-</span> <span class="st">"https://github.com/DU-Bii/module-3-Stat-R/raw/master/stat-R_2021/data/TCGA_BIC_subset/"</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>bic_folder <span class="ot"><-</span> <span class="st">"~/m3-stat-R/TCGA-BIC_analysis"</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="do">## Download and load the expression data table</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="do">## Note: we use check.names=FALSE to avoid replacing hyphens by dots</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="do">## in sample names, because we want to keep them as in the </span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="do">## original data files. </span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>bic_expr_file <span class="ot"><-</span> <span class="fu">download_only_once</span>(</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> <span class="at">url_base =</span> bic_url, </span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> <span class="at">file_name =</span> <span class="st">"BIC_log2-norm-counts_edgeR_DEG_top_1000.tsv.gz"</span>,</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a> <span class="at">local_folder =</span> bic_folder)</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>bic_expr <span class="ot"><-</span> <span class="fu">read.delim</span>(<span class="at">file =</span> bic_expr_file, </span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a> <span class="at">header =</span> <span class="cn">TRUE</span>, </span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="at">row.names =</span> <span class="dv">1</span>,</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a> <span class="at">check.names =</span> <span class="cn">FALSE</span>)</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a><span class="co"># colnames(bic_expr)</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="co"># dim(bic_expr)</span></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a><span class="co"># View(head(bic_expr))</span></span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a><span class="do">## Download the metadata file</span></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a>bic_meta_file <span class="ot"><-</span> <span class="fu">download_only_once</span>(</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a> <span class="at">url_base =</span> bic_url, </span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a> <span class="at">file_name =</span> <span class="st">"BIC_sample-classes.tsv.gz"</span>,</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a> <span class="at">local_folder =</span> bic_folder)</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a>bic_meta <span class="ot"><-</span> <span class="fu">read.delim</span>(<span class="at">file =</span> bic_meta_file, </span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a> <span class="at">header =</span> <span class="cn">TRUE</span>, </span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a> <span class="at">row.names =</span> <span class="dv">1</span>,</span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a> <span class="at">check.names =</span> <span class="cn">FALSE</span>)</span></code></pre></div>
</div>
</div>
<div id="exploring-the-metadata" class="section level2">
<h2>Exploring the metadata</h2>
<div id="exercise-metadata-exploration" class="section level3">
<h3>Exercise: metadata exploration</h3>
<p>Check the content of the metadata file by looking at the first rows, and count the number of samples per class.</p>
</div>
<div id="solution-metadata-exploration" class="section level3">
<h3>Solution: metadata exploration</h3>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Show the head of the metadata table</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">kable</span>(<span class="fu">head</span>(bic_meta, <span class="at">n =</span> <span class="dv">10</span>), <span class="at">caption =</span> <span class="st">"First rows of the BIC metadata table"</span>)</span></code></pre></div>
<table>
<caption>First rows of the BIC metadata table</caption>
<thead>
<tr class="header">
<th align="left"></th>
<th align="left">cancer.type</th>
<th align="left">ER1</th>
<th align="left">PR1</th>
<th align="left">Her2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">1AB92ADA-637E-4A42-A39A-70CEEEA41AE3</td>
<td align="left">Luminal.A</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
</tr>
<tr class="even">
<td align="left">DA98A67C-F11F-41D3-8223-1161EBFF8B58</td>
<td align="left">Unclassified</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
<td align="left">Negative</td>
</tr>
<tr class="odd">
<td align="left">06CCFD0F-7FB8-471E-B823-C7876582D6FC</td>
<td align="left">HER2pos</td>
<td align="left">Negative</td>
<td align="left">Negative</td>
<td align="left">Positive</td>
</tr>
<tr class="even">
<td align="left">A33B2F42-6EC6-4FB2-8BE5-542407A0382E</td>
<td align="left">Unclassified</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
<td align="left">Negative</td>
</tr>
<tr class="odd">
<td align="left">D021A258-8713-4383-9DCA-45E2F54A0411</td>
<td align="left">Luminal.A</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
</tr>
<tr class="even">
<td align="left">C705FA90-D9AA-4949-BACA-1C022A14CB03</td>
<td align="left">Luminal.A</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
</tr>
<tr class="odd">
<td align="left">85380A2D-9951-4D4B-A2A4-6F5F2AFC54E3</td>
<td align="left">Luminal.A</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
</tr>
<tr class="even">
<td align="left">F53A9C63-1AF7-4CBC-B8B7-4AA7AAED3364</td>
<td align="left">Luminal.A</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Negative</td>
</tr>
<tr class="odd">
<td align="left">13EF5323-EAD9-4BC7-8AC4-33875BF12E17</td>
<td align="left">Luminal.B</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
<td align="left">Positive</td>
</tr>
<tr class="even">
<td align="left">079EACA1-0319-4B54-B20B-673F4576C69D</td>
<td align="left">Basal.like</td>
<td align="left">Negative</td>
<td align="left">Negative</td>
<td align="left">Negative</td>
</tr>
</tbody>
</table>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Number of samples per class</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">kable</span>(<span class="fu">sort</span>(<span class="fu">table</span>(bic_meta<span class="sc">$</span>cancer.type), <span class="at">decreasing =</span> <span class="cn">TRUE</span>), </span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="at">caption =</span> <span class="st">"Number of samples per class"</span>)</span></code></pre></div>
<table>
<caption>Number of samples per class</caption>
<thead>
<tr class="header">
<th align="left">Var1</th>
<th align="right">Freq</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Luminal.A</td>
<td align="right">422</td>
</tr>
<tr class="even">
<td align="left">Basal.like</td>
<td align="right">131</td>
</tr>
<tr class="odd">
<td align="left">Luminal.B</td>
<td align="right">118</td>
</tr>
<tr class="even">
<td align="left">Unclassified</td>
<td align="right">107</td>
</tr>
<tr class="odd">
<td align="left">HER2pos</td>
<td align="right">41</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="sorting-samples-by-cancer-class" class="section level2">
<h2>Sorting samples by cancer class</h2>
<p>Sort the expression and metadata tables so that the samples with the same class come together. This is a bit triccky so we provide immediately the solution, but you might attempt todo it in you way if you have time.</p>
<div id="some-tips-sorting-samples" class="section level3">
<h3>Some tips: sorting samples</h3>
<ul>
<li><p>The expression table has one row per gene and one column per sample, whereas the metadata file has one row per sample.</p></li>
<li><p>You can use the function <code>order()</code> to obtain the indices of the metadata table by order of sample class. This will return a vector with all the indixes of the first class, then all the indices of the second class, etc.</p></li>
<li><p>You can then use the indices of the dataframes to re-order them.</p></li>
<li><p>We are not sure that the samples have the same order in the metadata and in the expression tables. So we will use the sample class to re-order the rows (samples) of the metadata file, but after this is done we will use the sample IDs of the metadata file (in the row names) in order to sort the columns of the expression table. This will guarantee that samples are in a consistent order between data and metadata tables.</p></li>
</ul>
</div>
<div id="solution-sorting-samples" class="section level3">
<h3>Solution: sorting samples</h3>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Check that the row names of the metadata contain the same set </span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="do">## of IDs than the column names of the expression table</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="fu">length</span>(<span class="fu">rownames</span>(bic_meta)) <span class="do">## the metadata contains 819 rows</span></span></code></pre></div>
<pre><code>[1] 819</code></pre>
<div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">length</span>(<span class="fu">colnames</span>(bic_expr)) <span class="do">## the data contains 819 columns</span></span></code></pre></div>
<pre><code>[1] 819</code></pre>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">length</span>(<span class="fu">intersect</span>(<span class="fu">rownames</span>(bic_meta), <span class="fu">colnames</span>(bic_expr))) <span class="do">## Their intersection contains the same number of elements</span></span></code></pre></div>
<pre><code>[1] 819</code></pre>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Get the sample order according to the cancer type column of the metadata</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>sample_order <span class="ot"><-</span> <span class="fu">order</span>(bic_meta<span class="sc">$</span>cancer.type)</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(bic_meta<span class="sc">$</span>cancer.type[sample_order])</span></code></pre></div>
<pre><code> [1] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[20] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[39] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[58] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[77] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[96] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like"
[115] "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "Basal.like" "HER2pos" "HER2pos"
[134] "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos"
[153] "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos" "HER2pos"
[172] "HER2pos" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[191] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[210] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[229] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[248] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[267] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[286] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[305] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[324] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[343] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[362] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[381] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[400] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[419] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[438] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[457] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[476] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[495] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[514] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[533] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[552] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[571] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A"
[590] "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.A" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[609] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[628] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[647] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[666] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[685] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B"
[704] "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Luminal.B" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[723] "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[742] "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[761] "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[780] "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[799] "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"
[818] "Unclassified" "Unclassified"</code></pre>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Sort the metadata rows (samples) according to this order</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>bic_meta <span class="ot"><-</span> bic_meta[sample_order, ]</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="do">## Sort the expression table to make sure that the samples (columns)</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="do">## come in the same order as the rows of the metadata table</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>bic_expr <span class="ot"><-</span> bic_expr[, <span class="fu">row.names</span>(bic_meta)]</span></code></pre></div>
</div>
</div>
<div id="generate-readable-sample-labels" class="section level2">
<h2>Generate readable sample labels</h2>
<p>The TCGA samples have very long and structured IDs, whcih make them inconvenient to display on the type of graphs we would like to generate (box plots, PC plots, heatmaps, …).</p>
<p>We will thus generate readable labels, and add them as a separate column to the metadata table.</p>
<p>For this, we will use the class name and use the funciotn <code>make.names()</code> to avoid having duplicate labels (this will add a number besides the class name).</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Sample labels ####</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>label <span class="ot"><-</span> <span class="fu">make.names</span>(</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="at">names =</span> bic_meta<span class="sc">$</span>cancer.type, </span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="at">unique =</span> <span class="cn">TRUE</span>)</span></code></pre></div>
</div>
<div id="get-gene-ids-and-names" class="section level2">
<h2>Get gene IDs and names</h2>
<p>The row names of the expression table indicate the Ensembl gene ID and the version of the genome annotation (a number after the dot). This ID is quite long, and we would prefer to have more readable labels for the genes. For this, we will get the gene name (also called “gene symbol”).</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Get gene names ####</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>gene_info <span class="ot"><-</span> <span class="fu">data.frame</span>(</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="at">ENSG =</span> <span class="fu">sub</span>(<span class="at">x =</span> <span class="fu">row.names</span>(bic_expr), </span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="at">perl =</span> <span class="cn">TRUE</span>, </span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="at">replacement =</span> <span class="st">""</span>, </span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="at">pattern =</span> <span class="st">'</span><span class="sc">\\</span><span class="st">..*'</span>),</span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> <span class="at">row.names =</span> <span class="fu">row.names</span>(bic_expr)</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a><span class="co"># head(gene_info)</span></span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a><span class="do">## Get gene names from ENSEMBL using biomaRt package</span></span>
<span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(<span class="st">"biomaRt"</span>)</span>
<span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a>ensembl <span class="ot"><-</span> <span class="fu">useMart</span>(<span class="st">"ensembl"</span>, <span class="at">dataset =</span> <span class="st">"hsapiens_gene_ensembl"</span>)</span>
<span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a>ensembl_info <span class="ot"><-</span> <span class="fu">getBM</span>(<span class="at">attributes =</span> <span class="fu">c</span>(<span class="st">'ensembl_gene_id'</span>,</span>
<span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a> <span class="st">'external_gene_name'</span>,</span>
<span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a> <span class="st">'description'</span>),</span>
<span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a> <span class="at">filters =</span> <span class="st">'ensembl_gene_id'</span>,</span>
<span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a> <span class="at">values =</span> gene_info<span class="sc">$</span>ENSG,</span>
<span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a> <span class="at">mart =</span> ensembl)</span>
<span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a><span class="fu">row.names</span>(ensembl_info) <span class="ot"><-</span> ensembl_info<span class="sc">$</span>ensembl_gene_id</span>
<span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-24"><a href="#cb15-24" aria-hidden="true" tabindex="-1"></a><span class="do">## Add gene name in a new column of the gene_info table</span></span>
<span id="cb15-25"><a href="#cb15-25" aria-hidden="true" tabindex="-1"></a>gene_info<span class="sc">$</span>name <span class="ot"><-</span> ensembl_info[gene_info<span class="sc">$</span>ENSG, <span class="st">"external_gene_name"</span>]</span>
<span id="cb15-26"><a href="#cb15-26" aria-hidden="true" tabindex="-1"></a>gene_info<span class="sc">$</span>description <span class="ot"><-</span> ensembl_info[gene_info<span class="sc">$</span>ENSG, <span class="st">"description"</span>]</span>
<span id="cb15-27"><a href="#cb15-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-28"><a href="#cb15-28" aria-hidden="true" tabindex="-1"></a><span class="do">## There is one gene with no name</span></span>
<span id="cb15-29"><a href="#cb15-29" aria-hidden="true" tabindex="-1"></a><span class="co"># sum(is.na(gene_info$name))</span></span>
<span id="cb15-30"><a href="#cb15-30" aria-hidden="true" tabindex="-1"></a><span class="do">## Replace undefined gene names by the gene ID</span></span>
<span id="cb15-31"><a href="#cb15-31" aria-hidden="true" tabindex="-1"></a>gene_info[<span class="fu">is.na</span>(gene_info<span class="sc">$</span>name), <span class="st">"name"</span>] <span class="ot"><-</span></span>
<span id="cb15-32"><a href="#cb15-32" aria-hidden="true" tabindex="-1"></a> gene_info[<span class="fu">is.na</span>(gene_info<span class="sc">$</span>name), <span class="st">"ENSG"</span>]</span>
<span id="cb15-33"><a href="#cb15-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-34"><a href="#cb15-34" aria-hidden="true" tabindex="-1"></a><span class="do">## Print the first rows of the gene info table</span></span>
<span id="cb15-35"><a href="#cb15-35" aria-hidden="true" tabindex="-1"></a><span class="fu">kable</span>(<span class="fu">head</span>(gene_info, <span class="at">n =</span> <span class="dv">10</span>), <span class="at">caption =</span> <span class="st">"First rows of the gene information table collected from BioMart"</span>)</span></code></pre></div>
<table>
<caption>First rows of the gene information table collected from BioMart</caption>
<colgroup>
<col width="13%" />
<col width="11%" />
<col width="6%" />
<col width="68%" />
</colgroup>
<thead>
<tr class="header">
<th align="left"></th>
<th align="left">ENSG</th>
<th align="left">name</th>
<th align="left">description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">ENSG00000000003.14</td>
<td align="left">ENSG00000000003</td>
<td align="left">TSPAN6</td>
<td align="left">tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858]</td>
</tr>
<tr class="even">
<td align="left">ENSG00000000419.12</td>
<td align="left">ENSG00000000419</td>
<td align="left">DPM1</td>
<td align="left">dolichyl-phosphate mannosyltransferase subunit 1, catalytic [Source:HGNC Symbol;Acc:HGNC:3005]</td>
</tr>
<tr class="odd">
<td align="left">ENSG00000000457.13</td>
<td align="left">ENSG00000000457</td>
<td align="left">SCYL3</td>
<td align="left">SCY1 like pseudokinase 3 [Source:HGNC Symbol;Acc:HGNC:19285]</td>
</tr>
<tr class="even">
<td align="left">ENSG00000000460.16</td>
<td align="left">ENSG00000000460</td>
<td align="left">C1orf112</td>
<td align="left">chromosome 1 open reading frame 112 [Source:HGNC Symbol;Acc:HGNC:25565]</td>
</tr>
<tr class="odd">
<td align="left">ENSG00000000938.12</td>
<td align="left">ENSG00000000938</td>
<td align="left">FGR</td>
<td align="left">FGR proto-oncogene, Src family tyrosine kinase [Source:HGNC Symbol;Acc:HGNC:3697]</td>
</tr>
<tr class="even">
<td align="left">ENSG00000000971.15</td>
<td align="left">ENSG00000000971</td>
<td align="left">CFH</td>
<td align="left">complement factor H [Source:HGNC Symbol;Acc:HGNC:4883]</td>
</tr>
<tr class="odd">
<td align="left">ENSG00000001036.13</td>
<td align="left">ENSG00000001036</td>
<td align="left">FUCA2</td>
<td align="left">alpha-L-fucosidase 2 [Source:HGNC Symbol;Acc:HGNC:4008]</td>
</tr>
<tr class="even">
<td align="left">ENSG00000001084.10</td>
<td align="left">ENSG00000001084</td>
<td align="left">GCLC</td>
<td align="left">glutamate-cysteine ligase catalytic subunit [Source:HGNC Symbol;Acc:HGNC:4311]</td>
</tr>
<tr class="odd">
<td align="left">ENSG00000001167.14</td>
<td align="left">ENSG00000001167</td>
<td align="left">NFYA</td>
<td align="left">nuclear transcription factor Y subunit alpha [Source:HGNC Symbol;Acc:HGNC:7804]</td>
</tr>
<tr class="even">
<td align="left">ENSG00000001460.17</td>
<td align="left">ENSG00000001460</td>
<td align="left">STPG1</td>
<td align="left">sperm tail PG-rich repeat containing 1 [Source:HGNC Symbol;Acc:HGNC:28070]</td>
</tr>
</tbody>
</table>
<div class="sourceCode" id="cb16"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="do">## Export the gene info table in a TSV file</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>gene_info_file_tsv <span class="ot"><-</span> <span class="fu">file.path</span>(bic_folder, <span class="st">"BIC_top1000_gene_info.tsv"</span>)</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="fu">message</span>(<span class="st">"Saving gene info table in TSV file</span><span class="sc">\n\t</span><span class="st">"</span>, </span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> gene_info_file_tsv)</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="fu">write.table</span>(<span class="at">x =</span> gene_info, </span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> <span class="at">file =</span> gene_info_file_tsv, </span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="at">quote =</span> <span class="cn">FALSE</span>, </span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="at">sep =</span> <span class="st">"</span><span class="sc">\t</span><span class="st">"</span>, </span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> <span class="at">row.names =</span> <span class="cn">TRUE</span>, </span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> <span class="at">col.names =</span> <span class="cn">NA</span>)</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a><span class="do">## Export the gene info table in Excel (xlsx) file</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>gene_info_file_xlsx <span class="ot"><-</span> <span class="fu">file.path</span>(bic_folder, <span class="st">"BIC_top1000_gene_info.xlsx"</span>)</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="fu">message</span>(<span class="st">"Saving gene info table in xlsx file</span><span class="sc">\n\t</span><span class="st">"</span>,</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> gene_info_file_xlsx)</span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a>openxlsx<span class="sc">::</span><span class="fu">write.xlsx</span>(<span class="at">x =</span> gene_info, <span class="at">file =</span> gene_info_file_xlsx)</span></code></pre></div>
</div>
<div id="assign-sample-colors" class="section level2">
<h2>Assign sample colors</h2>
<p>We add here a code to assign a color to each sample according to its class.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Class and sample colors ####</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>class_names <span class="ot"><-</span> <span class="fu">unique</span>(bic_meta<span class="sc">$</span>cancer.type)</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a><span class="do">## Assign one color to each class</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>class_color <span class="ot"><-</span> <span class="fu">c</span>(</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> <span class="st">"Basal.like"</span> <span class="ot">=</span> <span class="st">"#DDEEFF"</span>, </span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a> <span class="st">"HER2pos"</span> <span class="ot">=</span> <span class="st">"#88FF88"</span>,</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> <span class="st">"Luminal.A"</span> <span class="ot">=</span> <span class="st">"#FFBB55"</span>,</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> <span class="st">"Luminal.B"</span> <span class="ot">=</span> <span class="st">"#EE88FF"</span>,</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="st">"Unclassified"</span> <span class="ot">=</span> <span class="st">"#8888FF"</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="do">## Assign a color to each sample according to its class</span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>color <span class="ot"><-</span> class_color[bic_meta<span class="sc">$</span>cancer.type]</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a><span class="co"># table(bic_meta$color)</span></span></code></pre></div>
</div>
<div id="sample-standardisation" class="section level2">
<h2>Sample standardisation</h2>
<p>We will standardise the samples using robust estimators, to avoid the impact of extreme values.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Sample standardisation ####</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="do">## Compute descriptive stats on the samples </span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>mean <span class="ot"><-</span> <span class="fu">apply</span>(bic_expr, <span class="dv">2</span>, mean)</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>sd <span class="ot"><-</span> <span class="fu">apply</span>(bic_expr, <span class="dv">2</span>, sd)</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>median <span class="ot"><-</span> <span class="fu">apply</span>(bic_expr, <span class="dv">2</span>, median)</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>bic_meta<span class="sc">$</span>iqr <span class="ot"><-</span> <span class="fu">apply</span>(bic_expr, <span class="dv">2</span>, IQR)</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="do">## Centering on the mean</span></span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>expr_median <span class="ot"><-</span> <span class="fu">median</span>(<span class="fu">unlist</span>(bic_expr))</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>bic_expr_centered <span class="ot"><-</span> <span class="fu">t</span>(<span class="fu">t</span>(bic_expr) <span class="sc">-</span> bic_meta<span class="sc">$</span>median) <span class="sc">+</span> expr_median</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a><span class="co"># dim(bic_expr)</span></span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a><span class="co"># dim(bic_expr_centered)</span></span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a><span class="co"># apply(bic_expr_centered, 2, median)</span></span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a><span class="do">## Compute the IQR of the whole expression table</span></span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a>expr_iqr <span class="ot"><-</span> <span class="fu">IQR</span>(<span class="fu">unlist</span>(bic_expr_centered)) </span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a><span class="do">## Scale each sample based on its IQR</span></span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a>bic_expr_std <span class="ot"><-</span> <span class="fu">t</span>((<span class="fu">t</span>(bic_expr) <span class="sc">-</span> <span class="fu">unlist</span>(bic_meta<span class="sc">$</span>median)) </span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a> <span class="sc">/</span> <span class="fu">unlist</span>(bic_meta<span class="sc">$</span>iqr)) <span class="sc">*</span> expr_iqr <span class="sc">+</span> expr_median</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a><span class="co"># apply(bic_expr_std, 2, IQR)</span></span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a><span class="co"># range( apply(bic_expr_std, 2, IQR))</span></span>
<span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a><span class="co"># range( apply(bic_expr_std, 2, median))</span></span></code></pre></div>
</div>
<div id="dataset-with-readable-row-and-column-names" class="section level2">
<h2>Dataset with readable row and column names</h2>
<p>Create a dataset with</p>
<ul>
<li>gene names as row.names</li>
<li>sample labels as column.names</li>
</ul>
<div class="sourceCode" id="cb19"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Data set with readable </span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>bic_expr_labels <span class="ot"><-</span> bic_expr_std</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="fu">dim</span>(bic_expr_labels)</span></code></pre></div>
<pre><code>[1] 1000 819</code></pre>
<div class="sourceCode" id="cb21"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">colnames</span>(bic_expr_labels) <span class="ot"><-</span> bic_meta<span class="sc">$</span>label</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="fu">rownames</span>(bic_expr_labels) <span class="ot"><-</span> <span class="fu">make.names</span>(gene_info<span class="sc">$</span>name) </span></code></pre></div>
</div>
<div id="box-plot" class="section level2">
<h2>Box plot</h2>
<div id="exercise-box-plot" class="section level3">
<h3>Exercise: box plot</h3>
<p>Select 30 samples at random, and draw a boxplot.</p>
<p>Sort the samples according to their index in the columns of the expression table (this will ensure that samples of the same class come together).</p>
</div>
<div id="solution-box-plot" class="section level3">
<h3>Solution: box plot</h3>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Box plot ####</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>sample_size <span class="ot"><-</span> <span class="dv">30</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="do">## select sample indices</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>selected_samples <span class="ot"><-</span> <span class="fu">sort</span>(<span class="fu">sample</span>(</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> <span class="at">x =</span> <span class="dv">1</span><span class="sc">:</span><span class="fu">ncol</span>(bic_expr), </span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> <span class="at">size =</span> sample_size,</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> <span class="at">replace =</span> <span class="cn">FALSE</span>))</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a><span class="do">## Box plots</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>par.ori <span class="ot"><-</span> <span class="fu">par</span>(<span class="at">no.readonly =</span> <span class="cn">TRUE</span>)</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a><span class="fu">par</span>(<span class="at">mfrow =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">3</span>))</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a><span class="fu">par</span>(<span class="at">mar =</span> <span class="fu">c</span>(<span class="dv">5</span>, <span class="dv">6</span>, <span class="dv">4</span>, <span class="dv">1</span>))</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a><span class="do">## Original data</span></span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a><span class="fu">boxplot</span>(<span class="at">x =</span> bic_expr[, selected_samples], </span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a> <span class="at">names =</span> bic_meta<span class="sc">$</span>label[selected_samples],</span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a> <span class="at">col =</span> bic_meta<span class="sc">$</span>color[selected_samples],</span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a> <span class="at">xlab =</span> <span class="st">"log2(counts)"</span>,</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a> <span class="at">main =</span> <span class="st">"Before normalisation"</span>,</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a> <span class="at">horizontal =</span> <span class="cn">TRUE</span>, </span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a> <span class="at">las =</span> <span class="dv">1</span>, <span class="at">notch =</span> <span class="cn">TRUE</span>,</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a> <span class="at">cex.axis =</span> <span class="dv">1</span>)</span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a><span class="fu">boxplot</span>(<span class="at">x =</span> bic_expr_centered[, selected_samples], </span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a> <span class="at">names =</span> bic_meta<span class="sc">$</span>label[selected_samples],</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a> <span class="at">col =</span> bic_meta<span class="sc">$</span>color[selected_samples],</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a> <span class="at">xlab =</span> <span class="st">"log2(counts)"</span>,</span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a> <span class="at">main =</span> <span class="st">"Median-based centering"</span>,</span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a> <span class="at">horizontal =</span> <span class="cn">TRUE</span>, </span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a> <span class="at">las =</span> <span class="dv">1</span>, <span class="at">notch =</span> <span class="cn">TRUE</span>,</span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a> <span class="at">cex.axis =</span> <span class="fl">0.7</span>)</span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a><span class="fu">boxplot</span>(<span class="at">x =</span> bic_expr_std[, selected_samples], </span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a> <span class="at">names =</span> bic_meta<span class="sc">$</span>label[selected_samples],</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a> <span class="at">col =</span> bic_meta<span class="sc">$</span>color[selected_samples],</span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a> <span class="at">xlab =</span> <span class="st">"log2(counts)"</span>,</span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a> <span class="at">main =</span> <span class="st">"IQR-based scaling"</span>,</span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a> <span class="at">horizontal =</span> <span class="cn">TRUE</span>, </span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a> <span class="at">las =</span> <span class="dv">1</span>, <span class="at">notch =</span> <span class="cn">TRUE</span>,</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a> <span class="at">cex.axis =</span> <span class="fl">0.7</span>)</span></code></pre></div>
<div class="figure" style="text-align: center">
<img src="figures/tcga-bic_box_plot_rand_samples-1.png" alt="Box plot of a random selection of samples from the TCGA-BIC transcriptome dataset. " width="100%" />
<p class="caption">
Box plot of a random selection of samples from the TCGA-BIC transcriptome dataset.
</p>
</div>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="fu">par</span>(par.ori)</span></code></pre></div>
</div>
</div>
<div id="pca" class="section level2">
<h2>PCA</h2>
<div id="exercise" class="section level3">
<h3>Exercise</h3>
<p>Generate a graph with the two first components and color samples according to their class. Do the sample classes segregate on this PC plot?</p>
</div>
<div id="solution" class="section level3">
<h3>Solution</h3>
<div class="sourceCode" id="cb24"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### PC plot of the TAGC BIC samples ####</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(factoextra)</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a><span class="do">## Compute the principal components</span></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a><span class="do">## This is done on the transposed data table</span></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a>bic_pca <span class="ot"><-</span> <span class="fu">PCA</span>(<span class="fu">t</span>(bic_expr_labels), </span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a> <span class="at">scale.unit =</span> <span class="cn">FALSE</span>, </span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a> <span class="at">ncp =</span> <span class="fu">ncol</span>(bic_expr_labels), </span>
<span id="cb24-10"><a href="#cb24-10" aria-hidden="true" tabindex="-1"></a> <span class="at">graph =</span> <span class="cn">FALSE</span>)</span>
<span id="cb24-11"><a href="#cb24-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-12"><a href="#cb24-12" aria-hidden="true" tabindex="-1"></a><span class="do">## Table with the coordinates of each sample in the PC space</span></span>
<span id="cb24-13"><a href="#cb24-13" aria-hidden="true" tabindex="-1"></a>bic_pcs <span class="ot"><-</span> bic_pca<span class="sc">$</span>ind<span class="sc">$</span>coord</span>
<span id="cb24-14"><a href="#cb24-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-15"><a href="#cb24-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Check the PC dimensions</span></span>
<span id="cb24-16"><a href="#cb24-16" aria-hidden="true" tabindex="-1"></a><span class="fu">dim</span>(bic_pcs)</span></code></pre></div>
<pre><code>[1] 819 818</code></pre>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Plot the variance per component ####</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a><span class="fu">fviz_eig</span>(bic_pca, <span class="at">addlabels =</span> <span class="cn">TRUE</span>)</span></code></pre></div>
<div class="figure" style="text-align: center">
<img src="figures/tcga-bic_pca_variance_lot-1.png" alt="Variance of the first components for the BIC dataset" width="60%" />
<p class="caption">
Variance of the first components for the BIC dataset
</p>
</div>
<p>The plots of individuals show that the cancer types segregate partly onb the first and second PCs.</p>
<div class="sourceCode" id="cb27"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Plot individuals on the 2 first components ####</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a><span class="fu">fviz_pca_ind</span>(<span class="at">X =</span> bic_pca, </span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a> <span class="at">axes =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">2</span>), </span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a> <span class="at">col.ind =</span> bic_meta<span class="sc">$</span>cancer.type,</span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="at">label =</span> <span class="st">"none"</span>) </span></code></pre></div>
<p><img src="figures/tcga-bic_pca_ind_plot_PC1-2-1.png" width="672" style="display: block; margin: auto;" /></p>
<p>The 3rd and 4th PCs seem much less informative with respect to the cancer types: the individuals seem intermingled irrespective of their cancer type.</p>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="do">#### Plot individuals on the 2 first components ####</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a><span class="fu">fviz_pca_ind</span>(<span class="at">X =</span> bic_pca, </span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a> <span class="at">axes =</span> <span class="fu">c</span>(<span class="dv">3</span>, <span class="dv">4</span>), </span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a> <span class="at">col.ind =</span> bic_meta<span class="sc">$</span>cancer.type,</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a> <span class="at">label =</span> <span class="st">"none"</span>) </span></code></pre></div>
<p><img src="figures/tcga-bic_pca_ind_plot_PC3-4-1.png" width="672" style="display: block; margin: auto;" /></p>
</div>
</div>
<div id="variables" class="section level2">
<h2>Variables</h2>
<p>At the end of this tutorial, we dispose of the following data types.</p>
<table>
<colgroup>
<col width="23%" />
<col width="76%" />
</colgroup>
<thead>
<tr class="header">
<th>Variable name</th>
<th>Data content</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>bic_meta</code></td>
<td>metadata with a few added columns (sample color, estimators of central tendency and dispersion)</td>
</tr>
<tr class="even">