-
Notifications
You must be signed in to change notification settings - Fork 10
Basic Skills Inference #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
hodsonjames
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few quick fixes. I'd like you to run a test over a few hundred profiles and look at what is being found, to get an idea of whether it's working as expected. I'd also like you to run an evaluation metric for overlap with the true set of raw_skills for profiles that do have a non-empty raw_skills field. That is, take n profiles that have 1 or more raw_skills, and see how many of those raw skills are found with this approach, per profile.
| keywords[v] = 1 | ||
|
|
||
| def infer_exp(self, p, e, return_scores = False): | ||
| if "role" not in e.keys() or e["role"] == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys() yields a list, which loses you the computational advantage of a dictionary look-up. Instead, the better way to do this is either:
(a) if 'role' in e and e['role']:
(b) if e.get('role',None):
| v = set(items[2].replace('\n', '').lower().split(', ')) | ||
| self.vals[k] = v | ||
|
|
||
| def render_scores(self, data, scores, keywords): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add in-line comments to document the purpose of each function, and any pertinent non-obvious parts within them?
| return [] | ||
| # Extract information from relevant fields | ||
| data = [] | ||
| for k in e["role"].keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, keys() is unnecessary, since you have an iterable already.
| scores = {} | ||
| keywords = {} | ||
|
|
||
| for k in self.vals.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again.
| if p["raw_skills"]: | ||
| return [p["raw_skills"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: For discussion. Here we make the assumption that each profile that contains 1 or more raw skills is 'complete', and that all other profiles are lacking skills. However, one might imagine that a profile with just one skill is incomplete and should have more skills generated, etc.
| for i in data: | ||
| for k in self.vals.keys(): | ||
| for v in self.vals[k]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether there might be significant performance improvements through sets and set operations rather than checking every field individually. Maybe you should be breaking early if you find a keyword, since I don't know how helpful the 'score' is in practice. Can you take a look at the score variable and see how it does? I'd expect anything caught in 'roles' to get multiplied several times because many of the fields are copies or substrings of one another...
| class InferSkills: | ||
| def __init__(self): | ||
| self.vals = {} | ||
| with open('skillsets_labels.tsv', 'r') as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are only the top-n skills related to a particular skill-set. We need to remember to expand this to the list of all skills with more than k occurrences.
| return ret | ||
|
|
||
|
|
||
| def load_sample_profile(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan on using this in the final version, then you may want to rename it to something more general?
No description provided.