Skip to content

So many quesitons... #6

@csanadpoda

Description

@csanadpoda

Hello, this really seems like a great solution, however, after reading through the paper and the code, I have SO many questions!! Would you care to answer these?

  • How exactly did you create the .npy files? What exactly is in them? Is it the embedding of all the documents within that Series? Created how? I'd like to extend the solution with additional Series, but have no idea how to go about it.
  • In the paper you write „Each entry of this vector is defined as the inner product between query embeddings and the embedding of each 3GPP series summary description, generated through a dedicated LLM.”. What exactly do you mean by "a dedicated LLM"? Is it an LLM specifically trained or fine-tuned on 3GPP data?
  • How exactly did you generate the 3GPP summary descriptions? Based on what? Did you just pass all documentation of a series to an LLM and ask it to write a summary? Did those documents even fit into the context window?
  • What exactly is the "Benchmark RAG" you measured your improvement against? I didn't find any reference to what that is in the paper, nor did a Google search turn up anything.
  • In the paper you mentioned you tested the solution on Release 17, too. How exactly was this achieved? Did you just test the solution created for Release 18 with questions about Release 17? Or did you create a dedicated embedding set for Release 17 separately? The paper doesn't mention the specifics.

Basically, the question is, how could I extend this solution to support new Series that are not covered in this repo, and/or what would someone need to do to extend it with more Releases and Series?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions