Fixed typo

chaitjo · chaitjo · commit c9863fb2aeaa · 2020-02-25T17:17:17.000+08:00
diff --git a/authors/chaitanya-joshi/index.xml b/authors/chaitanya-joshi/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;
diff --git a/index.json b/index.json
diff --git a/post/index.xml b/post/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;
diff --git a/post/transformers-are-gnns/index.html b/post/transformers-are-gnns/index.html
@@ -628,7 +628,7 @@ <h3 id="breaking-down-the-transformer">Breaking down the Transformer</h3>
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$</p>
 <p>$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$</p>
 <p>where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the <strong>Q</strong>uery, <strong>K</strong>ey and <strong>V</strong>alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in <em>one shot</em>&ndash;another plus point for Transformers over RNNs, which update features word-by-word.</p>
diff --git a/tags/deep-learning/index.xml b/tags/deep-learning/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;
diff --git a/tags/graph-neural-networks/index.xml b/tags/graph-neural-networks/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;
diff --git a/tags/natural-language-processing/index.xml b/tags/natural-language-processing/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;
diff --git a/tags/transformer/index.xml b/tags/transformer/index.xml
@@ -66,7 +66,7 @@ $$&lt;/p&gt;
 i.e.,\ h_{i}^{\ell+1} = \sum_{j \in \mathcal{S}} w_{ij} \left( V^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;$$
-\text{where} ;\ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
+\text{where} \ w_{ij} = \text{softmax}_j \left( Q^{\ell} h_{i}^{\ell} \cdot  K^{\ell} h_{j}^{\ell} \right),
 $$&lt;/p&gt;
 &lt;p&gt;where $j \in \mathcal{S}$ denotes the set of words in the sentence and $Q^{\ell}, K^{\ell}, V^{\ell}$ are learnable linear weights (denoting the &lt;strong&gt;Q&lt;/strong&gt;uery, &lt;strong&gt;K&lt;/strong&gt;ey and &lt;strong&gt;V&lt;/strong&gt;alue for the attention computation, respectively).
 The attention mechanism is performed parallelly for each word in the sentence to obtain their updated features in &lt;em&gt;one shot&lt;/em&gt;&amp;ndash;another plus point for Transformers over RNNs, which update features word-by-word.&lt;/p&gt;