Skip to content

Commit 2df5596

Browse files
committed
chinese: lots of typos
1 parent 602e763 commit 2df5596

File tree

1 file changed

+21
-11
lines changed

1 file changed

+21
-11
lines changed

content/tech/chinese_app/index.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ weight: 900
1010

1111
I've been studying mandarin chinese for a few years now, using a few pretty
1212
great apps. I started out just learning ultra basic phrases from my spouse who
13-
bought me pimsleur to listen to in the car which was my foundation. The next
13+
bought me Pimsleur to listen to in the car which was my foundation. The next
1414
app I used, which got me all the way through the basics was [Hello
1515
Chinese](https://www.hellochinese.cc/), easily the best app I've used for
1616
learning. Besides gamified lessons and reviews, it included a reader with a
@@ -38,10 +38,10 @@ probably mostly true. Your brain should soak up patterns over time, get used
3838
to recognizing words without thinking about it, and begin to naturally replicate
3939
pronunciation. I _don't_ agree with those who claim CI is the _only_ tool you need
4040
to reach fluency. Especially when the target language is so completely different
41-
than your own native langague.
41+
than your own native language.
4242

4343
I can still remember the feeling of "Holy shit I understand exactly what he
44-
said!" the second time I tried listinging to [Tea Time Chinese
44+
said!" the second time I tried listening to [Tea Time Chinese
4545
(茶歇中文)](https://teatimechinese.com/). My favorite CI resources for chinese are:
4646

4747
* [Tea Time Chinese (茶歇中文)](https://teatimechinese.com/).
@@ -54,7 +54,7 @@ reasonable price for tutors from all over the world for different languages.
5454
Every teacher has a different approach, and I think I am very lucky to have
5555
found a teacher who just had conversations with me, sometimes with a topic in
5656
mind, sometimes just letting the conversation naturally flow. He would
57-
strategically introduce new words, when I was tyring to say something too
57+
strategically introduce new words. When I was trying to say something too
5858
complex, he'd encourage me to use the simpler language that I'm already
5959
comfortable with to express myself, and use simple language to teach words
6060
without using any English!
@@ -70,7 +70,7 @@ English when introducing new words.
7070
Doing a couple hours a week of classes and using bit of Chinese at home with my
7171
spouse is _not going to get me from intermediate to fluent in a reasonable
7272
timeframe. Another tool besides CI that many language learning enthusiasts swear by
73-
is a Spaced Repitition System or SRS. There are apps for this like [Anki](https://apps.ankiweb.net/)
73+
is a Spaced Repetition System or SRS. There are apps for this like [Anki](https://apps.ankiweb.net/)
7474
or you could simply systematically organize physical flashcards. The basic idea is
7575
that you review whatever you're trying to memorize daily. When you are correct on the first try,
7676
you advance that item one level. Items in a higher level are reviewed less frequently until you
@@ -95,7 +95,7 @@ What I have built so far has a few key features:
9595
* TTS all over the place.
9696

9797
And I still have a lot to do:
98-
* A library of CI content to feed into the reader
98+
* A library of Comprehensible Input content to feed into the reader
9999
* Camera based OCR to look up words you encounter IRL
100100
* Component/radical search for the dictionary
101101
* A floating widget that displays over other apps to look up words
@@ -115,15 +115,15 @@ is written around the local SQLite database which syncs data to and from Postgre
115115
the background.
116116

117117
Supabase is a nice managed Postgres, but it also provides an easy way to spin up local
118-
development databases with it's CLI. It also gives us a nice Dart API with support for
118+
development databases with its CLI. It also gives us a nice Dart API with support for
119119
doing 3rd party OAuth (google) or E-Mail/Password auth, all of that tying into Postgres
120120
Row-Level Security (RLS). There is a generic syncer that just uses a version and timestamp
121121
column to choose which side will overwrite what, and options for special merge rules. Other than
122122
that, the only difference between local and remote schemas are a user_id column in Postgres.
123123

124124
Finally, since I'm reaching out to Azure for TTS and OpenAI (maybe I'll switch
125125
to Deepseek since it's Chinese?) I want to keep those off of the user's phone,
126-
so we have a little Go middle man that can maybe take on more responsibilty if
126+
so we have a little Go middle man that can maybe take on more responsibility if
127127
I find the need later.
128128

129129
Again, keeping with the local-first priority, there are some on-device ML models
@@ -191,13 +191,23 @@ to tell what word and definition in the dictionary corresponds to a character
191191
in some text. There is so much ambiguity, and rule-based systems can only do
192192
an okay job dealing with it. Some examples:
193193

194-
> 我的学长流血了很长时间.
194+
> 我的学长留学了在北京外国语大学很长时间.
195195
196196
It can be tricky to figure out whether 长 is 'cháng' or 'zhǎng'.
197197
The first instance is easy. 学长 is a word in the dictionary, so
198198
we can deal with that by just looking for the longest sequence that
199199
exists in the dictionary.
200200

201+
"北京外国语大学" is also tricky; should we group 外国 (foreign) or 国语 (the
202+
national language, Mandarin). In this case, it should either be "外国",
203+
"外国语" (foreign language) or even "北京外国语大学" since that's the name of
204+
the school. Since 外国 actually exists in the dictionary database, we should
205+
pick that, and then the mappings for 北京, 语,and 大学 still make sense on
206+
their own.
207+
208+
Another example, with 的话 a super common pair of characters that only
209+
_sometimes_ should be separated:
210+
201211
> 如果你喜欢周杰伦的话,你应该听妈妈的话。
202212
203213
In this case, 的话 has two possible meanings:
@@ -222,7 +232,7 @@ final bScore = tokenScores[i][0];
222232
final iScore = tokenScores[i][1];
223233
224234
// do some funny math to figure out _how_ confident the model is
225-
// this makes the numbers a bit more interperetable for tuning threshold
235+
// this makes the numbers a bit more interpretable for tuning threshold
226236
final pB = exp(bScore / temperature) /
227237
(exp(bScore / temperature) + exp(iScore / temperature));
228238
final pI = exp(iScore / temperature) /
@@ -293,7 +303,7 @@ By <a href="//commons.wikimedia.org/wiki/User:Zirguezi" title="User:Zirguezi">Zi
293303

294304
The SRS itself is a pretty basic [Leitner
295305
system](https://en.wikipedia.org/wiki/Leitner_system). There's actually very
296-
little interesting techincal stuff going on here. The interface on top of it is
306+
little interesting technical stuff going on here. The interface on top of it is
297307
what I find more useful than others' systems; but it's still very, very simple.
298308

299309
{{< gallery >}}

0 commit comments

Comments
 (0)