Skip to content

Commit b69b8f0

Browse files
authored
Merge pull request #120 from ricj/master
another iteration
2 parents 51c6381 + 3be8cb0 commit b69b8f0

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

_pages/dat450/assignment2.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Now, we need to reshape the query, key, and value tensors so that the individual
9090
q = q.view(b, m, n_h, d_h).transpose(1, 2)
9191
```
9292

93-
Now apply the RoPE rotations to the query and key representations. Use the utility function `apply_rotary_pos_emb` provided in the code skeleton and just provide the `position_embedding` that you received as an input to `forward`. The utility function returns the modified query and key representations.
93+
Now apply the RoPE rotations to the query and key representations. Use the utility function `apply_rotary_pos_emb` provided in the code skeleton and just provide the `rope_rotations` that you received as an input to `forward`. The utility function returns the modified query and key representations.
9494

9595
**Sanity check step 1.**
9696
Create an untrained MHA layer. Create some 3-dimensional tensor where the last dimension has the same size as `hidden_size`, as you did in the previous sanity checks. Apply the MHA layer with what you have implemented so far and make sure it does not crash. (It is common to see errors related to tensor shapes here.)
@@ -158,8 +158,8 @@ out = h_new + h_old
158158

159159
### The complete Transformer stack
160160

161-
Now, set up the complete Transformer stack including embedding and unembedding layers.
162-
The embedding and unembedding layers will be identical to what you had in Programming Assignment 1 (except that the unembedding layer should be bias-free, as mentioned in the beginning).
161+
Now, set up the complete Transformer stack including embedding, top-level normalizer, and unembedding layers.
162+
The embedding and unembedding layers will be identical to what you had in Programming Assignment 1 (except that the unembedding layer should not use bias terms, as mentioned in the beginning).
163163

164164
<details>
165165
<summary><b>Hint</b>: Use a <a href="https://docs.pytorch.org/docs/stable/generated/torch.nn.ModuleList.html"><code>ModuleList</code></a>.</summary>
@@ -170,13 +170,15 @@ Put all the Transformer blocks in a <code>ModuleList</code> instead of a plain P
170170
</details>
171171

172172
<details>
173-
<summary><b>Hint</b>: Creating the RoPE embeddings.</summary>
173+
<summary><b>Hint</b>: Creating and applying the RoPE embeddings.</summary>
174174
<div style="margin-left: 10px; border-radius: 4px; background: #ddfff0; border: 1px solid black; padding: 5px;">
175-
Xxx.
175+
Create the <code>A2RotaryEmbedding</code> in <code>__init__</code>, as already indicated in the code skeleton. Then in <code>forward</code>, first create the rotations (again, already included in the skeleton). Then pass the rotations when you apply each Transformer layer.
176176
</pre>
177177
</div>
178178
</details>
179179

180+
**Sanity check.** Now, the language model should be complete and you can test this in the same way as in Programming Assignment 1. Create a 2-dimensional *integer* tensor and apply your Transformer to it. The result should be a 3-dimensional tensor where the last dimension is equal to the vocabulary size.
181+
180182
## Step 2: Training the language model
181183

182184
In Assignment 1, you implemented a utility to handle training and validation. Your Transformer language model should be possible to use as a drop-in replacement for the RNN-based model you had in that assignment.

0 commit comments

Comments
 (0)