Skip to content

Commit 4981aeb

Browse files
committed
Initial release
0 parents  commit 4981aeb

26 files changed

+2179
-0
lines changed

.babelrc

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"env": {
3+
"development": {
4+
"presets": [
5+
["@babel/env"]
6+
],
7+
"plugins": [
8+
"add-module-exports"
9+
]
10+
},
11+
"production": {
12+
"presets": [
13+
["@babel/env"],
14+
"minify"
15+
],
16+
"plugins": [
17+
"add-module-exports"
18+
]
19+
}
20+
}
21+
}

.editorconfig

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# EditorConfig helps developers define and maintain
2+
# consistent coding styles between different editors and IDEs.
3+
4+
root = true
5+
6+
[*]
7+
end_of_line = lf
8+
charset = utf-8
9+
trim_trailing_whitespace = true
10+
insert_final_newline = true
11+
indent_style = space
12+
indent_size = 4
13+
14+
[*.md]
15+
trim_trailing_whitespace = false

.eslintrc

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"parserOptions": {
3+
"ecmaVersion": 9,
4+
"sourceType": "module"
5+
},
6+
"rules": {
7+
"semi": ["warn", "never"],
8+
"no-mixed-spaces-and-tabs": "warn",
9+
"indent": [
10+
"warn"
11+
],
12+
"max-statements-per-line": [
13+
"warn",
14+
{
15+
"max": 2
16+
}
17+
]
18+
}
19+
}

.gitignore

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
logs
2+
*.log
3+
npm-debug.log*
4+
pids
5+
*.pid
6+
*.seed
7+
lib-cov
8+
coverage
9+
.nyc_output
10+
node_modules
11+
jspm_packages
12+
.npm
13+
.node_repl_history
14+
.idea
15+
lib
16+
package-lock.json
17+
yarn.lock
18+
.DS_Store

.npmignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
*.log
2+
npm-debug.log*
3+
coverage
4+
.nyc_output
5+
node_modules
6+
package-lock.json
7+
yarn.lock
8+
src
9+
test
10+
CHANGELOG.md
11+
.travis.yml
12+
.editorconfig
13+
.eslintrc
14+
.babelrc
15+
.gitignore

.travis.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
language: node_js
2+
node_js:
3+
- '8'
4+
- '6'
5+
script:
6+
- npm run test
7+
- npm run build
8+
branches:
9+
only:
10+
- master

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
## [1.0.0] - 2020-08-26
6+
7+
Initial release
8+
9+
[1.0.0]: https://github.com/andreekeberg/ml-classify-text-js/releases/tag/1.0.0

CONTRIBUTING.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Contributing to JavaScript Text Classifier
2+
3+
This document contains basic guidelines to make contributing to this project as easy and transparent as possible, whether it's:
4+
5+
- Reporting a bug
6+
- Discussing the current state of the code
7+
- Submitting a fix
8+
- Proposing new features
9+
- Becoming a maintainer
10+
11+
## Pull requests are actively welcomed
12+
13+
1. Fork the repo and create your branch from `master`.
14+
2. If you've added code that should be tested, add tests.
15+
3. If you've changed APIs, update the documentation.
16+
5. Make sure your code lints.
17+
6. Issue your pull request.
18+
19+
## Any contributions you make will be under the MIT Software License
20+
21+
In short, when you submit code changes, your submissions are understood to be under the same [MIT License](http://choosealicense.com/licenses/mit/) that covers the project.
22+
23+
## Report bugs using [issues](https://github.com/andreekeberg/ml-classify-text-js/issues)
24+
25+
All bugs are tracked using GitHub issues to track public bugs. Report a bug by [opening a new issue](https://github.com/andreekeberg/ml-classify-text-js/issues/new); it's that easy!
26+
27+
## Write bug reports with detail, background, and sample code
28+
29+
**Great bug reports** tend to have:
30+
31+
- A quick summary and/or background
32+
- Steps to reproduce
33+
- Be specific!
34+
- Give sample code if you can.
35+
- What you expected would happen
36+
- What actually happens
37+
- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)
38+
39+
## License
40+
41+
By contributing, you agree that your contributions will be licensed under its MIT License.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2020 André Ekeberg <[email protected]>
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# 📄 JavaScript Text Classifier
2+
3+
Use machine learning to classify text using [n-grams](https://en.wikipedia.org/wiki/N-gram) and [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity).
4+
5+
Minimal library that can be used both in the **browser** and in **Node.js**, that allows you to train a model with a large amount of text samples (and corresponding labels), and then use this model to quickly predict one or more appropriate labels for new text samples.
6+
7+
## Installation
8+
9+
**Using npm**
10+
11+
```
12+
npm install ml-classify-text
13+
```
14+
15+
**Using yarn**
16+
17+
```
18+
yarn add ml-classify-text
19+
```
20+
21+
## Getting started
22+
23+
**Import as an ES6 module**
24+
25+
```javascript
26+
import Classifier from 'ml-classify-text'
27+
```
28+
29+
**Import as a CommonJS module**
30+
31+
```javascript
32+
const { Classifier } = require('ml-classify-text')
33+
```
34+
35+
## Basic usage
36+
37+
### Setting up a new Classifier instance
38+
39+
```javascript
40+
const classifier = new Classifier()
41+
```
42+
43+
### Training a model
44+
45+
```javascript
46+
let positive = [
47+
'This is great, so cool!',
48+
'Wow, I love it!',
49+
'It really is amazing',
50+
]
51+
52+
let negative = [
53+
'This is really bad',
54+
'I hate it with a passion',
55+
'Just terrible!',
56+
]
57+
58+
classifier.train(positive, 'positive')
59+
classifier.train(negative, 'negative')
60+
```
61+
62+
### Getting a prediction
63+
64+
```javascript
65+
let predictions = classifier.predict('It sure is pretty great!')
66+
67+
if (predictions.length) {
68+
predictions.forEach(prediction => {
69+
console.log(`${prediction.label} (${prediction.confidence})`)
70+
})
71+
} else {
72+
console.log('No predictions returned')
73+
}
74+
```
75+
76+
Returning:
77+
78+
```
79+
positive (0.5423261445466404)
80+
```
81+
82+
## Advanced usage
83+
84+
### Configuration
85+
86+
The following configuration options can be passed both directly to a new [Model](docs/model.md), or indirectly by passing it to the [Classifier](docs/classifier.md) constructor.
87+
88+
#### Options
89+
90+
| Property | Description | Default |
91+
| --- | --- | --- |
92+
| **nGramMin** | Minimum n-gram size | `1` |
93+
| **nGramMax** | Maximum n-gram size | `1` |
94+
| **minimumConfidence** | Minimum confidence required for predictions | `0.2` |
95+
| **vocabulary** | Terms mapped to indexes in the model data entries, set to `false` to store terms directly in the data entries | `[]` |
96+
| **data** | Object literal containing all training data | `{}` |
97+
98+
### Using n-grams
99+
100+
The default behavior is to split up texts by single words (known as a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model), or unigrams).
101+
102+
This has a few limitations, since by ignoring the order of words, it's impossible to correctly match phrases and expressions.
103+
104+
In comes [n-grams](https://en.wikipedia.org/wiki/N-gram), which, when set to use more than one word per term, act like a sliding window that moves across the text — a continuous sequence of words of the specified amount, which can greatly improve the accuracy of predictions.
105+
106+
#### Example of using n-grams with a size of 2 (bigrams)
107+
108+
```javascript
109+
const classifier = new Classifier({
110+
nGramMin: 2,
111+
nGramMax: 2
112+
})
113+
114+
let tokens = tokenize('I really dont like it')
115+
116+
console.log(tokens)
117+
```
118+
119+
Returning:
120+
121+
```javascript
122+
{
123+
'i really': 1,
124+
'really dont': 1,
125+
'dont like': 1,
126+
'like it': 1
127+
}
128+
```
129+
130+
### Serializing a model
131+
132+
After training a model with large sets of data, you'll want to store all this data, to allow you to simply set up a new model using this training data at another time, and quicky make predictions.
133+
134+
To do this, simply use the `serialize` method on your [Model](docs/model.md), and either save the data structure to a file, send it to a server, or store it in any other way you want.
135+
136+
```javascript
137+
let model = classifier.model
138+
139+
console.log(model.serialize())
140+
```
141+
142+
Returning:
143+
144+
```
145+
{
146+
nGramMin: 1,
147+
nGramMax: 1,
148+
minimumConfidence: 0.2,
149+
vocabulary: [
150+
'this', 'is', 'great',
151+
'so', 'cool', 'wow',
152+
'i', 'love', 'it',
153+
'really', 'amazing', 'bad',
154+
'hate', 'with', 'a',
155+
'passion', 'just', 'terrible'
156+
],
157+
data: {
158+
positive: {
159+
'0': 1, '1': 2, '2': 1,
160+
'3': 1, '4': 1, '5': 1,
161+
'6': 1, '7': 1, '8': 2,
162+
'9': 1, '10': 1
163+
},
164+
negative: {
165+
'0': 1, '1': 1, '6': 1,
166+
'8': 1, '9': 1, '11': 1,
167+
'12': 1, '13': 1, '14': 1,
168+
'15': 1, '16': 1, '17': 1
169+
}
170+
}
171+
}
172+
```
173+
174+
## Documentation
175+
176+
* [Classifier](docs/classifier.md)
177+
* [Model](docs/model.md)
178+
* [Vocabulary](docs/vocabulary.md)
179+
* [Prediction](docs/prediction.md)
180+
181+
## Contributing
182+
183+
Read the [contribution guidelines](CONTRIBUTING.md).
184+
185+
## Changelog
186+
187+
Refer to the [changelog](CHANGELOG.md) for a full history of the project.
188+
189+
## License
190+
191+
JavaScript Text Classifier is licensed under the [MIT license](LICENSE).

0 commit comments

Comments
 (0)