Skip to content

Python vs Javascript #37

@Ataraxist

Description

@Ataraxist

Problem

This project was initially built as a pure JavaScript project to enable wider deployment, but various functions and libraries were originally built, and intended to be used in python. As a result, this project was refactored after an initial test build to include a JavaScript pipeline for individual embeddings and a python version for batched embeddings. For example, you cant use GPU acceleration in the JavaScript pipeline, but you can in the Python one.

What this means is that the tkyoDriftSetTraining.py file and the tkyoDrift.js processes are functionally duplicates of each other except that the former is explicitly meant to be called once for a batch, while the later is meant to be invoked on every new input.

Solution

This is fine as it is, but since many JavaScript libraries are just python scripts wearing a disguise, it would be ideal to rebuild this entire platform in python with a JavaScript NPM package to install it, and a JavaScript function hook to pass data into it. This would allow this system to avoid unnecessary conversion from JavaScript into python to execute AI embeddings, calculate K means, or generate the HNSW index.

Additional information

There may be an additional unintended knock on effect in that the xenova model tokenizer behaves slightly different from the python tokenizer, which yields marginally different embeddings for the same text, added noise to cosine similarity scores.

👨‍👧‍👦 Contributing

  • 🙋‍♂️ Yes, I'd love to make a PR to implement this feature!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions