Skip to content

Conversation

@kunaladhia01
Copy link

No description provided.

Copy link
Owner

@hodsonjames hodsonjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick fixes. I'd like you to run a test over a few hundred profiles and look at what is being found, to get an idea of whether it's working as expected. I'd also like you to run an evaluation metric for overlap with the true set of raw_skills for profiles that do have a non-empty raw_skills field. That is, take n profiles that have 1 or more raw_skills, and see how many of those raw skills are found with this approach, per profile.

keywords[v] = 1

def infer_exp(self, p, e, return_scores = False):
if "role" not in e.keys() or e["role"] == None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys() yields a list, which loses you the computational advantage of a dictionary look-up. Instead, the better way to do this is either:
(a) if 'role' in e and e['role']:
(b) if e.get('role',None):

v = set(items[2].replace('\n', '').lower().split(', '))
self.vals[k] = v

def render_scores(self, data, scores, keywords):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add in-line comments to document the purpose of each function, and any pertinent non-obvious parts within them?

return []
# Extract information from relevant fields
data = []
for k in e["role"].keys():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, keys() is unnecessary, since you have an iterable already.

scores = {}
keywords = {}

for k in self.vals.keys():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

Comment on lines +74 to +75
if p["raw_skills"]:
return [p["raw_skills"]]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: For discussion. Here we make the assumption that each profile that contains 1 or more raw skills is 'complete', and that all other profiles are lacking skills. However, one might imagine that a profile with just one skill is incomplete and should have more skills generated, etc.

Comment on lines +16 to +18
for i in data:
for k in self.vals.keys():
for v in self.vals[k]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether there might be significant performance improvements through sets and set operations rather than checking every field individually. Maybe you should be breaking early if you find a keyword, since I don't know how helpful the 'score' is in practice. Can you take a look at the score variable and see how it does? I'd expect anything caught in 'roles' to get multiplied several times because many of the fields are copies or substrings of one another...

class InferSkills:
def __init__(self):
self.vals = {}
with open('skillsets_labels.tsv', 'r') as f:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are only the top-n skills related to a particular skill-set. We need to remember to expand this to the list of all skills with more than k occurrences.

return ret


def load_sample_profile(self):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan on using this in the final version, then you may want to rename it to something more general?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants