You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When zero_point is not provided, the default value is 2^(bits-1): 8 for 4 bits, 128 for 8 bits.
4543
+
4544
+
If block_size is provided, both hidden_size and inter_size must be divisible by the block size, and
4545
+
the dequantization is performed per block of size block_size along the K (input feature) dimension.
4546
+
4547
+
If block_size and zero_point are provided, both hidden_size and inter_size must be divisible by block_size * pack_size,
4548
+
where pack_size = 8 / expert_weight_bits.
4549
+
4541
4550
The SwiGLU (Swish-Gated Linear Unit) activation function is like:
4542
4551
g = xW + b
4543
4552
l = xV + c
@@ -4579,7 +4588,7 @@ This version of the operator has been available since version 1 of the 'com.micr
4579
4588
<dd>Whether to use sparse mixer</dd>
4580
4589
</dl>
4581
4590
4582
-
#### Inputs (7 - 11)
4591
+
#### Inputs (7 - 14)
4583
4592
4584
4593
<dl>
4585
4594
<dt><tt>input</tt> : T</dt>
@@ -4604,6 +4613,12 @@ This version of the operator has been available since version 1 of the 'com.micr
4604
4613
<dd>2D optional tensor with shape (num_experts, inter_size), or 3D optional tensor with shape (num_experts, inter_size, hidden_size / block_size) when block_size is provided.</dd>
4605
4614
<dt><tt>fc3_experts_bias</tt> (optional) : T</dt>
4606
4615
<dd>2D optional tensor with shape (num_experts, inter_size)</dd>
4616
+
<dt><tt>fc1_zero_points</tt> (optional) : T1</dt>
4617
+
<dd>2D tensor with shape (num_experts, fusion_size * inter_size / pack_size), or 3D tensor with shape (num_experts, fusion_size * inter_size, hidden_size / block_size / pack_size) when block_size is provided.</dd>
4618
+
<dt><tt>fc2_zero_points</tt> (optional) : T1</dt>
4619
+
<dd>2D tensor with shape (num_experts, hidden_size / pack_size), or 3D tensor with shape (num_experts, hidden_size, inter_size / block_size / pack_size) when block_size is provided.</dd>
4620
+
<dt><tt>fc3_zero_points</tt> (optional) : T1</dt>
4621
+
<dd>2D optional tensor with shape (num_experts, inter_size / pack_size), or 3D optional tensor with shape (num_experts, inter_size, hidden_size / block_size / pack_size) when block_size is provided.</dd>
0 commit comments