You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _pages/dat450/assignment2.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ Now, we need to reshape the query, key, and value tensors so that the individual
90
90
q = q.view(b, m, n_h, d_h).transpose(1, 2)
91
91
```
92
92
93
-
Now apply the RoPE rotations to the query and key representations. Use the utility function `apply_rotary_pos_emb` provided in the code skeleton and just provide the `position_embedding` that you received as an input to `forward`. The utility function returns the modified query and key representations.
93
+
Now apply the RoPE rotations to the query and key representations. Use the utility function `apply_rotary_pos_emb` provided in the code skeleton and just provide the `rope_rotations` that you received as an input to `forward`. The utility function returns the modified query and key representations.
94
94
95
95
**Sanity check step 1.**
96
96
Create an untrained MHA layer. Create some 3-dimensional tensor where the last dimension has the same size as `hidden_size`, as you did in the previous sanity checks. Apply the MHA layer with what you have implemented so far and make sure it does not crash. (It is common to see errors related to tensor shapes here.)
@@ -158,8 +158,8 @@ out = h_new + h_old
158
158
159
159
### The complete Transformer stack
160
160
161
-
Now, set up the complete Transformer stack including embedding and unembedding layers.
162
-
The embedding and unembedding layers will be identical to what you had in Programming Assignment 1 (except that the unembedding layer should be bias-free, as mentioned in the beginning).
161
+
Now, set up the complete Transformer stack including embedding, top-level normalizer, and unembedding layers.
162
+
The embedding and unembedding layers will be identical to what you had in Programming Assignment 1 (except that the unembedding layer should not use bias terms, as mentioned in the beginning).
163
163
164
164
<details>
165
165
<summary><b>Hint</b>: Use a <ahref="https://docs.pytorch.org/docs/stable/generated/torch.nn.ModuleList.html"><code>ModuleList</code></a>.</summary>
@@ -170,13 +170,15 @@ Put all the Transformer blocks in a <code>ModuleList</code> instead of a plain P
170
170
</details>
171
171
172
172
<details>
173
-
<summary><b>Hint</b>: Creating the RoPE embeddings.</summary>
173
+
<summary><b>Hint</b>: Creating and applying the RoPE embeddings.</summary>
Create the <code>A2RotaryEmbedding</code> in <code>__init__</code>, as already indicated in the code skeleton. Then in <code>forward</code>, first create the rotations (again, already included in the skeleton). Then pass the rotations when you apply each Transformer layer.
176
176
</pre>
177
177
</div>
178
178
</details>
179
179
180
+
**Sanity check.** Now, the language model should be complete and you can test this in the same way as in Programming Assignment 1. Create a 2-dimensional *integer* tensor and apply your Transformer to it. The result should be a 3-dimensional tensor where the last dimension is equal to the vocabulary size.
181
+
180
182
## Step 2: Training the language model
181
183
182
184
In Assignment 1, you implemented a utility to handle training and validation. Your Transformer language model should be possible to use as a drop-in replacement for the RNN-based model you had in that assignment.
0 commit comments