Skip to content

Commit b9e7d83

Browse files
committed
Revise documentation.
1 parent bce4a4e commit b9e7d83

File tree

4 files changed

+25
-15
lines changed

4 files changed

+25
-15
lines changed

README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[中文](https://github.com/hankcs/HanLP/tree/doc-zh) | [docs](https://hanlp.hankcs.com/docs/) | [1.x](https://github.com/hankcs/HanLP/tree/1.x) | [forum](https://bbs.hankcs.com/) | [docker](https://github.com/WalterInSH/hanlp-jupyter-docker)
44

5-
The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable. It comes with pretrained models for 104 human languages including English, Chinese and many others.
5+
The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable.
66

77
Thanks to open-access corpora like Universal Dependencies and OntoNotes, HanLP 2.1 now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing.
88

@@ -45,12 +45,14 @@ HanLPClient HanLP = new HanLPClient("https://hanlp.hankcs.com/api", "your_auth",
4545

4646
### Quick Start
4747

48-
No matter which language you uses, the same interface can be used to parse a document.
48+
No matter which language you use, the same interface can be used to parse a document.
4949

5050
```python
5151
HanLP.parse("In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environment. 2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。")
5252
```
5353

54+
See [docs](https://hanlp.hankcs.com/docs/tutorial.html) for visualization, annotation guidelines and more details.
55+
5456
## Native APIs
5557

5658
```bash
@@ -63,21 +65,21 @@ HanLP requires Python 3.6 or later. GPU/TPU is suggested but not mandatory.
6365

6466
```python
6567
import hanlp
66-
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH)
67-
HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environment.',
68-
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',
69-
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。'])
68+
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MT5_BASE)
69+
print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environment.',
70+
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',
71+
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。']))
7072
```
7173

72-
In particular, the Python `HanLPClient` can also be used as a callable function following the same semantics. See [docs](https://hanlp.hankcs.com/docs/) for more details.
74+
In particular, the Python `HanLPClient` can also be used as a callable function following the same semantics. See [docs](https://hanlp.hankcs.com/docs/tutorial.html) for visualization, annotation guidelines and more details.
7375

7476
## Train Your Own Models
7577

7678
To write DL models is not hard, the real hard thing is to write a model able to reproduce the scores in papers. The snippet below shows how to surpass the state-of-the-art tokenizer in 9 minutes.
7779

7880
```python
7981
tokenizer = TransformerTaggingTokenizer()
80-
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.61'
82+
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.66'
8183
tokenizer.fit(
8284
SIGHAN2005_PKU_TRAIN_ALL,
8385
SIGHAN2005_PKU_TEST, # Conventionally, no devset is used. See Tian et al. (2020).
@@ -97,13 +99,13 @@ tokenizer.fit(
9799
tokenizer.evaluate(SIGHAN2005_PKU_TEST, save_dir)
98100
```
99101

100-
The result is guaranteed to be `96.66` as the random feed is fixed. Different from some overclaining papers and projects, HanLP promises every digit in our scores are reproducible. Any issues on reproducibility will be treated and solved as a top-priority fatal bug.
102+
The result is guaranteed to be `96.66` as the random feed is fixed. Different from some overclaiming papers and projects, HanLP promises every single digit in our scores is reproducible. Any issues on reproducibility will be treated and solved as a top-priority fatal bug.
101103

102104
## Performance
103105

104106
<table><thead><tr><th rowspan="2">lang</th><th rowspan="2">corpora</th><th rowspan="2">model</th><th colspan="2">tok</th><th colspan="4">pos</th><th colspan="3">ner</th><th rowspan="2">dep</th><th rowspan="2">con</th><th rowspan="2">srl</th><th colspan="4">sdp</th><th rowspan="2">lem</th><th rowspan="2">fea</th><th rowspan="2">amr</th></tr><tr><td>fine</td><td>coarse</td><td>ctb</td><td>pku</td><td>863</td><td>ud</td><td>pku</td><td>msra</td><td>ontonotes</td><td>SemEval16</td><td>DM</td><td>PAS</td><td>PSD</td></tr></thead><tbody><tr><td rowspan="2">mul</td><td rowspan="2">UD2.7 <br>OntoNotes5</td><td>small</td><td>98.30</td><td>-</td><td>-</td><td>-</td><td>-</td><td>91.72</td><td>-</td><td>-</td><td>74.86</td><td>74.66</td><td>74.29</td><td>65.73</td><td>-</td><td>88.52</td><td>92.56</td><td>83.84</td><td>84.65</td><td>81.13</td><td>-</td></tr><tr><td>base</td><td>99.59</td><td>-</td><td>-</td><td>-</td><td>-</td><td>95.95</td><td>-</td><td>-</td><td>80.31</td><td>85.84</td><td>80.22</td><td>74.61</td><td>-</td><td>93.23</td><td>95.16</td><td>86.57</td><td>92.91</td><td>90.30</td><td>-</td></tr><tr><td rowspan="4">zh</td><td rowspan="2">open</td><td>small</td><td>97.25</td><td>-</td><td>96.66</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>95.00</td><td>84.57</td><td>87.62</td><td>73.40</td><td>84.57</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td>base</td><td>97.50</td><td>-</td><td>97.07</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>96.04</td><td>87.11</td><td>89.84</td><td>77.78</td><td>87.11</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td rowspan="2">close</td><td>small</td><td>96.70</td><td>95.93</td><td>96.87</td><td>97.56</td><td>95.05</td><td>-</td><td>96.22</td><td>95.74</td><td>76.79</td><td>84.44</td><td>88.13</td><td>75.81</td><td>74.28</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td>base</td><td>97.52</td><td>96.44</td><td>96.99</td><td>97.59</td><td>95.29</td><td>-</td><td>96.48</td><td>95.72</td><td>77.77</td><td>85.29</td><td>88.57</td><td>76.52</td><td>73.76</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr></tbody></table>
105107

106-
- Multilingual models are temporary ones which will be replaced in one week.
108+
- Multilingual models are temporary which will be replaced in one week.
107109
- AMR models will be released once our paper gets accepted.
108110

109111
## Citing
@@ -127,7 +129,7 @@ HanLP is licensed under **Apache License 2.0**. You can use HanLP in your commer
127129

128130
### Models
129131

130-
Unless specified, all models in HanLP are licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) .
132+
Unless otherwise specified, all models in HanLP are licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).
131133

132134
## References
133135

docs/configure.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,11 @@ environment variable and HanLP will pick it up at the next startup.
5151
export HANLP_URL=http://mirrors-hk.miduchina.com/hanlp/
5252
```
5353

54+
## Control Verbosity
55+
56+
By default, HanLP will print progressive message to console when you load a model. If you want to silence it, use the
57+
following environment variable.
58+
59+
```bash
60+
export HANLP_VERBOSE=0
61+
```

docs/tutorial.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ print(HanLP('In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techn
5353
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。' \
5454
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。'))
5555
```
56-
````{margin} **But what does these annotations mean?**
56+
````{margin} **But what do these annotations mean?**
5757
```{seealso}
5858
See our [data format](data_format) and [annotations](annotations/index) for details.
5959
```
@@ -63,8 +63,8 @@ See our [data format](data_format) and [annotations](annotations/index) for deta
6363
## Visualization
6464

6565
```{eval-rst}
66-
:class:`~hanlp_common.document.Document` has a handy method :meth:`~hanlp_common.document.Document.pretty_print`
67-
which offsers visualization in any mono-width text environment.
66+
The returned :class:`~hanlp_common.document.Document` has a handy method :meth:`~hanlp_common.document.Document.pretty_print`
67+
which offers visualization in any mono-width text environment.
6868
```
6969

7070
````{margin} **Non-ASCII**

hanlp/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
# Author: hankcs
33
# Date: 2019-12-28 19:26
44

5-
__version__ = '2.1.0-alpha.0'
5+
__version__ = '2.1.0-alpha.2'
66
"""HanLP version"""

0 commit comments

Comments
 (0)