|
| 1 | +--- |
| 2 | + |
| 3 | +title: "I'm gonna release my Chinese app" |
| 4 | +type: tech |
| 5 | +weight: 890 |
| 6 | +tags: ["language-learning", "app-development", "chinese"] |
| 7 | +--- |
| 8 | + |
| 9 | +It's been a while! My [last post](/posts/chinese_app_og/) was about |
| 10 | +this app, about a year ago after a couple months of work. I did not |
| 11 | +expect this to be a long-term project. |
| 12 | + |
| 13 | +Studying Mandarin became one of my main hobbies. Using my own app to study was |
| 14 | +cool, but it had lots of issues so I kept chipping away at it over nights and |
| 15 | +weekends. At this point I use it daily. After showing my in-laws (who are |
| 16 | +Malaysian Chinese, and native speakers), they showed enormous support and ended |
| 17 | +up motivating me to polish it for release as a proper app. |
| 18 | + |
| 19 | +{{< alert >}} |
| 20 | +If you're reading this post after December 2025, the app is (hopefully) available. |
| 21 | +See [https://bookchoy.app](https://bookchoy.app) for download links and more information! |
| 22 | +{{< /alert >}} |
| 23 | + |
| 24 | +### Another use-case |
| 25 | + |
| 26 | +It turns out the reading feature would actually have utility to my native |
| 27 | +speaker in-laws . Malaysia is an intersting place linguistically, in one |
| 28 | +sentence you might hear English, Malay, Mandarin, and other Chinese dialects |
| 29 | +all mixed together. There are a lot of differences in word choice compared to |
| 30 | +what you might hear in China. Some people, especially the younger generation, |
| 31 | +might speak fluently and can read most of what they'd encounter on a daily |
| 32 | +basis, but when speaking to China-chinese colleagues or friends, they might run |
| 33 | +into words they don't know. My in laws in particular had some books they picked |
| 34 | +up in Taiwan and they'd encounter words they didn't quite know, or traditional |
| 35 | +versions of characters they didn't recognize. I built the app for advanced |
| 36 | +beginners/intermediate learners, but it turns out it can be useful for native |
| 37 | +speakers and advanced learners too! |
| 38 | + |
| 39 | +## More LLM stuff |
| 40 | + |
| 41 | +### Omnisearch |
| 42 | + |
| 43 | +I'm definitely not bought in on the whole "LLMs will solve everything" hype, but |
| 44 | +language learning, and translation are both pretty fuzzy tasks. For example: |
| 45 | + |
| 46 | +> I missed you. |
| 47 | +
|
| 48 | +This could mean a few things in English: |
| 49 | + |
| 50 | +1. I missed the chance to see you. (我错过了你。) |
| 51 | +2. I long for you. (我想你。) |
| 52 | +3. I tried to hit you (in a game or something) but failed. (我没打中你。) |
| 53 | + |
| 54 | +LLMs are a pretty good tool to provide multiple options with context and |
| 55 | +explanations when the user doesn't want to write a full prompt, or allow the |
| 56 | +user to provide context. |
| 57 | + |
| 58 | +Translation is one of many usecases for the app. I wanted to easily look up |
| 59 | +anything from the home screen of the app, so I built an "omni-search" bar that |
| 60 | +is partially powered by AI. |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | +Based on the input, I can infer whether I just want to search words, sentences |
| 65 | +and stories (in-app content), or perform a more complex task that requires AI. |
| 66 | + |
| 67 | +The supported AI tasks are currently: |
| 68 | +* Translate (English <-> Chinese) |
| 69 | +* Word Overviews |
| 70 | +* Related word lookup (synonyms, antonyms, or easily confused words) |
| 71 | +* Generate example sentences |
| 72 | +* General Q&A as a catch-all for things like "when should I use 了 vs 过?" |
| 73 | + |
| 74 | +Most of these tasks are used elsewhere in the app, with an easy way to get to |
| 75 | +them from the omni-search bar as soon as you open the app. In the future this |
| 76 | +may open a chat-like interface, but I'm not trying to build a ChatGPT wrapper |
| 77 | +just yet. |
| 78 | + |
| 79 | +The AI generated responses re-use the reader, so you can tap words, and |
| 80 | +sentences to save them or explain them. This is everywhere in the app. |
| 81 | + |
| 82 | +I'm not simpliy throwing things over the wall to an LLM provider. There is a multi-step |
| 83 | +process that involves classifying user intent, fine-tuned task-specific prompts, picking |
| 84 | +an appropriate model, and most importantly: RAG. Between the dictionary and some other |
| 85 | +useful corpora, I have a lot of relevant information to provide context to the LLM. |
| 86 | + |
| 87 | +### Character breakdowns |
| 88 | + |
| 89 | +One cool feature that AI helps with is breaking down words character-by-character, |
| 90 | +and breaking down characters into their components. I realized that because I pretty |
| 91 | +much only type Chinese using pinyin input, I don't recognize some characters unless |
| 92 | +I see them in a familiar context, or I confuse similar-looking characters. |
| 93 | + |
| 94 | +{{< gallery >}} |
| 95 | + <img src="char_breakdown.png" alt="screenshot breaking down 强烈" class="grid-w50"> |
| 96 | + <img src="char_breakdown_2.png" alt="screenshot breaking down 退休" class="grid-w50"> |
| 97 | +{{< /gallery >}} |
| 98 | + |
| 99 | + |
| 100 | +Taking a moment to look at each piece of a word helps me remember it better. |
| 101 | +It's probably not a bad idea to try to write it once or twice on paper, but |
| 102 | +this is rarely convienent when learning a word when I'm out and about. If |
| 103 | +you're lucky, a nice pneumonic like "休 has a person 人 leaning on a tree 木, |
| 104 | +suggestint resting." |
| 105 | + |
| 106 | +One improvement I'd like to make here, is always including info from the |
| 107 | +traditional versions of characters, as some semantic information might be lost |
| 108 | +in the simplification. |
| 109 | + |
| 110 | +Again, we can't just rely on the LLM here. I have a tree-based decomposition |
| 111 | +dataset that I use to make sure the LLM doesn't guess, as it will almost always |
| 112 | +guess incorrectly. |
| 113 | + |
| 114 | +### Writing Exercicses (造句) |
| 115 | + |
| 116 | +Output is important. In the last post I had already built a review system where |
| 117 | +you memorize phrases or sentences, but this isn't a super creative exerciese. Now, |
| 118 | +when learning new words, you can write a response to a prompt using the new word and |
| 119 | +get feedback on whether your word choice makes sense, grammar is correct and suggestions |
| 120 | +on improving your overall expressiveness so you don't sound like a robot. |
| 121 | + |
| 122 | +{{< gallery >}} |
| 123 | + <img src="zaoju_typing.png" alt="screenshot of typing zaoju response" class="grid-w33"> |
| 124 | + <img src="zaoju_wrong.png" alt="screenshot of zaoju feedback" class="grid-w33"> |
| 125 | + <img src="zaoju_correct.png" alt="screenshot of correct zaoju response" class="grid-w33"> |
| 126 | +{{< /gallery >}} |
| 127 | + |
| 128 | +Then, you can save what you wrote for review later. |
| 129 | + |
| 130 | +## Lots of small stuff too |
| 131 | + |
| 132 | +Besides all the AI stuff, I've added a ton of new small features. Few |
| 133 | +highlights include: |
| 134 | + |
| 135 | +**Word Lists** |
| 136 | + |
| 137 | +Pre-built lists of words for specific topics (e.g. business, travel, HSK levels) |
| 138 | +that you can directly review or add to your own list in bulk. This also includes |
| 139 | +a system for me to easily push new entries to the dictionray without app updates, |
| 140 | +so that word lists can include jargon that might not be in a standard dictionary. |
| 141 | + |
| 142 | +{{< gallery >}} |
| 143 | + <img src="wordlists.png" alt="screenshot of word lists" class="grid-w50"> |
| 144 | + <img src="wordlist_programming.png" alt="screenshot of word list details" class="grid-w50"> |
| 145 | +{{< /gallery >}} |
| 146 | + |
| 147 | + |
| 148 | +**Better accuracy in word mappings** |
| 149 | + |
| 150 | +As I detailed in the [last post](/posts/chinese_app_og/), the training data |
| 151 | +wasn't perfect. It's still not perfect, but after a round of training with |
| 152 | +cleaner data I can confidently say it's better. I don't want to provid a number |
| 153 | +here yet, but I anticipate getting this to several 9's of accuracy soon. |
| 154 | + |
| 155 | +**Optimized data sync** |
| 156 | + |
| 157 | +I've evolved my generic SQLite -> Supabase sync engine to be more efficient, |
| 158 | +using some version numbers, push timestamps and high-water marks to avoid |
| 159 | +querying or sending excess data. It also supports conditional hard deletes, |
| 160 | +and customizable merging strategies! Latency on this is way down. |
| 161 | + |
| 162 | +**Embedded Video Player** |
| 163 | + |
| 164 | +{{< gallery >}} |
| 165 | + <img src="vidplayer.png" alt="screenshot of video player" class="grid-w33"> |
| 166 | +{{< /gallery >}} |
| 167 | + |
| 168 | +For Videos that have subtitles, an embedded video player |
| 169 | +syncs the reader interface with the video playback. You can look up |
| 170 | +words while watching, and use the transcript to scrub. |
| 171 | + |
| 172 | +**Import epub files** |
| 173 | + |
| 174 | +The reader can now import epub files. Currenlty this works by converting |
| 175 | +them into Markdown, but in the future this will be extended to preserve |
| 176 | +the original formatting. |
| 177 | + |
| 178 | +{{< gallery >}} |
| 179 | + <img src="epublibrary.png" alt="screenshot of epub import" class="grid-w50"> |
| 180 | +{{< /gallery >}} |
| 181 | + |
| 182 | +Also, the library UI received a facelift! |
| 183 | + |
| 184 | +## Productionization |
| 185 | + |
| 186 | +I'm very used to working with Kubernetes, but that's definitely overkill for |
| 187 | +this project that consists of a single backend Go service and a Flutter |
| 188 | +front-end. |
| 189 | + |
| 190 | +Unfortunately, my experience with Cloud Run, ECS, and Azure Container Apps have |
| 191 | +been less than stellar. There is too much boilerplate, and I don't like the |
| 192 | +idea of being tied to the cloud provider's ecosystem for observability and |
| 193 | +everything else. |
| 194 | + |
| 195 | +[Fly.io](https://fly.io) is pretty awesome. Declariative configuration, |
| 196 | +command-line driven deployment, and still using Docker images makes it very |
| 197 | +easy to deploy the app itself. The best part is that they provide built-in |
| 198 | +Prometheus and Grafana. All I had to do is instrument my app and put the |
| 199 | +metrics port in the app's config file. They handle TLS certs, load balancing, |
| 200 | +and basic ingress. Add-ons like Redis and Postgres are there when I need them. |
| 201 | + |
| 202 | + |
| 203 | +### Rate limiting |
| 204 | + |
| 205 | +[Agentgateway](https://agentgateway.dev/) is a very cool open-source project |
| 206 | +that my amazing colleague [John Howard](https://blog.howardjohn.info/) |
| 207 | +bootstrapped. For me it solves two problems: |
| 208 | + |
| 209 | +* Ingress: Basic controls on who can access certain routes, global rate-limiting per-user, and authentication. |
| 210 | +* [LLM Consumption](https://agentgateway.dev/docs/llm/about/): cost control, (token aware!) rate limiting, per-user rate limiting. |
| 211 | + |
| 212 | +Deploying this on Fly.io was super easy, along with the Envoy rate limiting |
| 213 | +server pointing to a managed Redis instance. |
| 214 | + |
| 215 | +### Observability |
| 216 | + |
| 217 | +In my actual job, I build networking software (Istio, Gloo Mesh) that customers |
| 218 | +deploy themselves. I haven't been directly responsible for an individual live |
| 219 | +service in a few years, although I build a product that helps people operate |
| 220 | +live services. |
| 221 | + |
| 222 | +One of the main use cases for that software is observability. Golden metrics |
| 223 | +(latency, errors, requests) aren't enough when you're paying per token. LLM |
| 224 | +usage is unpredictable. Different queries use wildly different token counts, |
| 225 | +and agentic loops can result in a non-deterministic number calls from a single |
| 226 | +user action. Tracking what types of queries happen, how fast they are, and how |
| 227 | +many tokens they burn is critical for both costs and user experience. |
| 228 | + |
| 229 | +Using a combination of Fly.io's managed Prometheus/Grafana and Agentgateway's |
| 230 | +LLM-specific metrics, I have a pretty good picture of how the system is behaving. |
| 231 | + |
| 232 | + |
| 233 | + |
| 234 | +Currently I'm tracking things like: |
| 235 | +* Latency per LLM call, and latency per task type |
| 236 | +* Requests per task |
| 237 | +* Tokens per task |
| 238 | +* Total tokens by task |
| 239 | + |
| 240 | +## What's next? |
| 241 | + |
| 242 | +Over the next few weeks, I will be going through the release process for Google |
| 243 | +Play. When Google approves the app for release, I'll initially do a soft launch |
| 244 | +to get feedback from real users. After that, an iOS release will follow. It's |
| 245 | +so hard to stop adding features, tweaking the UI and fixing little bugs, but I |
| 246 | +need to get this in other people's hands and stop developing in a vaccum. |
| 247 | + |
| 248 | +Realistically, this probably won't be available until after the winter holidays |
| 249 | +when I'll have more free time to focus on this outside of nights and weekends. |
| 250 | + |
| 251 | +### Bigger Plans |
| 252 | + |
| 253 | +If I gain traction, I have some bigger plans for the app. I realy do believe |
| 254 | +in human tutors. I know there is an AI craze right now, and a lot of "talk to a robot" |
| 255 | +language apps, but my personal view is that AI is a tool to augment what people can do, |
| 256 | +not replace them. |
| 257 | + |
| 258 | +While obviously I'm not a Chinese teacher, my Chinese is very very basic, with the |
| 259 | +help of people actually qualified to teach, the app could become a platform to |
| 260 | +allow tutors to create lessons, quizzes, assign reading and connect with students. |
| 261 | + |
| 262 | +Another idea is a "journaling" feature where you can get either AI or human feedback |
| 263 | +on your writing. |
| 264 | + |
| 265 | +I am already working on finding partners to help me with sourcing reading |
| 266 | +content for the app, as even though you can import your own content, having a |
| 267 | +library of graded readers and stories would be a huge plus. Maybe something |
| 268 | +like [Maayot](https://www.maayot.com/) that provides very short, level |
| 269 | +appropriate, daily reading passage with some discussion questions. |
| 270 | + |
| 271 | +There are so many possibilities. For now, I feel what I've bulit is a core platform |
| 272 | +that I can build on top of. |
| 273 | + |
| 274 | +### Browser Extension |
| 275 | + |
| 276 | +After the initial Android release, I actually plan on building a browser extension |
| 277 | +before focusing on iOS. I want the same experience for looking up words and explaining |
| 278 | +sentences to be available when browsing the web. |
| 279 | + |
| 280 | +Not only on web pages, but videos, TVs and movies too. In an hour of |
| 281 | +vibe-coding, I got a prototype working that brings the same experience to the |
| 282 | +subtitles on iQiyi. I'm confident I can get that working on Netflix, YouTube, |
| 283 | +and other popular platforms. This seems like table stakes looking at apps like |
| 284 | +Migaku and LanguageReactor. |
| 285 | + |
| 286 | + |
| 287 | + |
| 288 | +### Download it now |
| 289 | + |
| 290 | +If you're reading this post after December 2025, the app is (hopefully) available. |
| 291 | +See [https://bookchoy.app](https://bookchoy.app) for download links and more information! |
0 commit comments