You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable. It comes with pretrained models for 104 human languages including English, Chinese and many others.
5
+
The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable.
6
6
7
7
Thanks to open-access corpora like Universal Dependencies and OntoNotes, HanLP 2.1 now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
8
8
@@ -45,12 +45,14 @@ HanLPClient HanLP = new HanLPClient("https://hanlp.hankcs.com/api", "your_auth",
45
45
46
46
### Quick Start
47
47
48
-
No matter which language you uses, the same interface can be used to parse a document.
48
+
No matter which language you use, the same interface can be used to parse a document.
49
49
50
50
```python
51
51
HanLP.parse("In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environment. 2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。")
52
52
```
53
53
54
+
See [docs](https://hanlp.hankcs.com/docs/tutorial.html) for visualization, annotation guidelines and more details.
55
+
54
56
## Native APIs
55
57
56
58
```bash
@@ -63,21 +65,21 @@ HanLP requires Python 3.6 or later. GPU/TPU is suggested but not mandatory.
print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environment.',
70
+
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',
71
+
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。']))
70
72
```
71
73
72
-
In particular, the Python `HanLPClient` can also be used as a callable function following the same semantics. See [docs](https://hanlp.hankcs.com/docs/) for more details.
74
+
In particular, the Python `HanLPClient` can also be used as a callable function following the same semantics. See [docs](https://hanlp.hankcs.com/docs/tutorial.html) for visualization, annotation guidelines and more details.
73
75
74
76
## Train Your Own Models
75
77
76
78
To write DL models is not hard, the real hard thing is to write a model able to reproduce the scores in papers. The snippet below shows how to surpass the state-of-the-art tokenizer in 9 minutes.
SIGHAN2005_PKU_TEST, # Conventionally, no devset is used. See Tian et al. (2020).
@@ -97,13 +99,13 @@ tokenizer.fit(
97
99
tokenizer.evaluate(SIGHAN2005_PKU_TEST, save_dir)
98
100
```
99
101
100
-
The result is guaranteed to be `96.66` as the random feed is fixed. Different from some overclaining papers and projects, HanLP promises every digit in our scores are reproducible. Any issues on reproducibility will be treated and solved as a top-priority fatal bug.
102
+
The result is guaranteed to be `96.66` as the random feed is fixed. Different from some overclaiming papers and projects, HanLP promises every single digit in our scores is reproducible. Any issues on reproducibility will be treated and solved as a top-priority fatal bug.
0 commit comments