Improve Performance: CoPilot: users experience #110

quge009 · 2025-10-27T22:54:50Z

This PR is mainly about improving the user experience.

Changes made to optimize users perceived latency and reading experience:
- Implement streaming output for LLMSession class, change the final answer generation call to streaming output, to post the answer to user as-soon-as the first few tokens are ready.
- Implement the push_frontend method to leverage the steaming output to feedback the CoPilot progress status message to user in real-time, to manage users' experience during waiting for the answer.
- Add auto scroll feature for frontend plugin to enhance readability.
Changes made to reduce the average_response_latency (defined as time between question receival and answer posting):
- Refactor several components' (SmartHelp, LTP, ...) implementation into classes, to make it possible to preserve the states when necessary.
- Reuse the same llm_session instance for requests within the same conversation, by avoiding unnecessary https re-connection in initialization.
- Implement a new question parsing function to combine contextualization and classification llm calls into one efficient call, to reduce time.
- Move prompt reading to instance initialization, to avoid unnecessary file I/O operations.
Also a minor bug fix is included:
- Change the assignment of 'turnId' to frontend.

Effectiveness of this PR:

Impact on accuracy
- No change
Impact on response latency
- ~15% response time reduction on average
- ~50% response time reduction for extreme simple question

…thorized users

…, pre-load prompts for classifier, contextualizer

…gement

…sification session

…same username

hippogr · 2025-11-07T03:01:09Z

contrib/copilot-plugin/src/app/ChatBox.tsx

+
+        // Process all complete SSE messages in buffer
+        let sepIndex;
+        while ((sepIndex = buffer.indexOf('\n\n')) !== -1) {


I am not sure if this will cause infinite loop here when buffer.indexOf('\n\n') !== -1, please make sure that the loop can be breaked from the loop no matter which parts will be executed

src/copilot-chat/src/copilot_agent/ltp/ltp.py

hippogr · 2025-11-07T03:17:09Z

src/pylon/deploy/pylon-config/location.conf.template

  proxy_send_timeout 2m;
 }

+location ~ ^/copilot/api/stream(.*)$ {


do we need to remove the original part above?

I suggest to keep both apis, reason:

send a packet to /copilot/api/operation will response the whole response object, which includes step-wise intermediate results, used for accuracy evaluation during development

send a packet to /copilot/api/stream is the official way for a user to interact with the service, which response only the final answer string at lower latency

src/copilot-chat/src/copilot_agent/ltp/ltp.py

src/copilot-chat/src/copilot_agent/utils/smart_help.py

src/copilot-chat/src/copilot_agent/copilot_service.py

hippogr · 2025-11-07T04:39:29Z

src/copilot-chat/src/copilot_agent/copilot_service.py

+                try:
+                    llm_session.clear_instance_stream_callback()
+                except Exception:
+                    logger.debug('Failed to clear instance stream callback')


What will happen if exception happens here?

Sorry not quite get it, if clear_instance_stream_callback fails a log message will be logged.
Or are you suggesting q.put(None) should be wrapped by another try..except block?

src/copilot-chat/src/copilot_agent/copilot_turn.py

Copilot

Pull Request Overview

Copilot reviewed 28 out of 30 changed files in this pull request and generated 17 comments.

Files not reviewed (1)

contrib/copilot-plugin/package-lock.json: Language not supported

Comments suppressed due to low confidence (1)

src/copilot-chat/src/copilot_agent/utils/dcw.py:123

Call to function gen_dcw with too few arguments; should be no fewer than 3.

    full_dcw = gen_dcw(user_prompt, map_existing)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-11T07:44:07Z

contrib/copilot-plugin/src/app/ChatBox.tsx

      };
+
+      // Create assistant placeholder and attach the same messageInfo (turnId) so feedback maps to this response
+      useChatStore.getState().addChat({ role: "assistant", message: "", timestamp: new Date(), messageInfo });


A placeholder assistant message is created before the server response arrives. If the fetch fails early (line 78), the empty placeholder message remains in the chat history, creating a confusing user experience. The catch block at line 166 doesn't remove this placeholder. Consider removing the placeholder on error or marking it as failed.

Copilot · 2025-11-11T07:44:07Z

src/copilot-chat/src/copilot_agent/utils/llmsession.py

+                        try:
+                            cb = self._instance_stream_callback or LLMSession._global_stream_callback
+                            if cb:
+                                cb(full)


In the stream_chat method, when accumulating chunks, the code yields individual chunks but then calls the callback with the full accumulated text (cb(full)). This means the callback receives increasingly larger strings containing all previous content, which could cause performance issues with large responses. The callback should either receive only the new chunk or this accumulation pattern should be clearly documented as intended behavior for replace-style streaming.

Suggested change

cb(full)

cb(chunk)

Copilot · 2025-11-11T07:44:08Z

src/copilot-chat/src/copilot_agent/copilot_service.py

+            user_id = data['data']['messageInfo']['userId']
+            conv_id = data['data']['messageInfo']['convId']


Direct access to nested dictionary keys without validation could raise KeyError if the client sends malformed JSON. The code should validate that 'data', 'messageInfo', 'userId', and 'convId' exist before accessing them, or use .get() with appropriate error handling.

Suggested change

user_id = data['data']['messageInfo']['userId']

conv_id = data['data']['messageInfo']['convId']

# Validate required keys

if not data or 'data' not in data or 'messageInfo' not in data['data']:

return jsonify({"status": "error", "message": "Missing required fields: data.messageInfo"}), 400

message_info = data['data']['messageInfo']

user_id = message_info.get('userId')

conv_id = message_info.get('convId')

if not user_id or not conv_id:

return jsonify({"status": "error", "message": "Missing required fields: userId or convId"}), 400

Copilot · 2025-11-11T07:44:09Z

src/copilot-chat/src/copilot_agent/copilot_conversation.py

            'userId': user_id,
            'convId': conv_id,
-            'turnId': str(uuid.uuid4()),
+            'turnId': turn_id,


The turnId from question_msg_info is being assigned directly to the response without validation. If the frontend sends an invalid or missing turnId, this could cause issues. The code should validate or generate a fallback turnId if the incoming value is None or empty.

Copilot · 2025-11-11T07:44:09Z

src/copilot-chat/src/copilot_agent/copilot_conversation.py

+        if not AGENT_MINIMAL_ON:
+            # ingest kusto table
+            self.collect_data_to_kusto(log_data)


The logging and Kusto data collection is now conditionally executed based on AGENT_MINIMAL_ON, but there's no corresponding change to handle cleanup or error scenarios when this flag is enabled. Consider documenting this behavior or ensuring that critical debugging information is still captured in minimal mode.

Copilot · 2025-11-11T07:44:11Z

src/copilot-chat/src/copilot_agent/utils/llmsession.py

+                for snapshot in self.stream_chat(system_prompt, user_prompt):
+                    if snapshot:
+                        last = snapshot
+                return last


The try_stream_fallback_chat method returns last which is initialized to an empty string but only updated inside the loop when snapshot is truthy. If the generator yields only empty strings or None values, the method will return an empty string instead of falling back to the non-streaming chat() method. The logic should check if any meaningful content was received and fall back if not.

Suggested change

for snapshot in self.stream_chat(system_prompt, user_prompt):

if snapshot:

last = snapshot

return last

received_content = False

for snapshot in self.stream_chat(system_prompt, user_prompt):

if snapshot:

last = snapshot

received_content = True

if received_content:

return last

else:

logger.info('LLMSession: streaming yielded no content, falling back to non-streaming chat')

return self.chat(system_prompt, user_prompt)

Copilot · 2025-11-11T07:44:12Z

src/copilot-chat/src/copilot_agent/ltp/ltp.py

+            # push to frontend
+            push_frontend_event(reference)


The KQL query reference is being appended to the summary using push_frontend_event(reference) after the summary has already been generated and potentially streamed. This means the reference might be sent as a separate message or lost depending on timing. The reference should either be included in the summary generation or the streaming design should ensure proper ordering.

Suggested change

# push to frontend

push_frontend_event(reference)

# Append reference to summary so it is returned together

summary = summary + reference

Copilot · 2025-11-11T07:44:12Z

contrib/copilot-plugin/src/app/ChatHistory.tsx

+     const observer = new MutationObserver((mutations) => {
+       try {
+         const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
+         if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {
+           requestAnimationFrame(() => {
+             try {
+               scrollToBottom(el)
+             } catch (e) {
+               // ignore
+             }
+           })
+         }
+       } catch (e) {
+         // ignore
+       }
+     })
+
+     observer.observe(el, { childList: true, subtree: true, characterData: true })
+     return () => observer.disconnect()


The MutationObserver watches for all mutations in the entire subtree with { childList: true, subtree: true, characterData: true }. This could cause significant performance overhead during rapid streaming updates as every character change triggers the observer callback. Consider throttling/debouncing the scroll trigger or observing only specific elements.

Suggested change

const observer = new MutationObserver((mutations) => {

try {

const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight

if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {

requestAnimationFrame(() => {

try {

scrollToBottom(el)

} catch (e) {

// ignore

}

})

}

} catch (e) {

// ignore

}

})

observer.observe(el, { childList: true, subtree: true, characterData: true })

return () => observer.disconnect()

// Debounce scroll trigger to avoid excessive calls during rapid updates

let debounceTimer: ReturnType<typeof setTimeout> | null = null;

const DEBOUNCE_DELAY = 100; // ms

const debouncedScroll = () => {

if (debounceTimer) clearTimeout(debounceTimer);

debounceTimer = setTimeout(() => {

try {

const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight

if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {

requestAnimationFrame(() => {

try {

scrollToBottom(el)

} catch (e) {

// ignore

}

})

}

} catch (e) {

// ignore

}

}, DEBOUNCE_DELAY);

};

const observer = new MutationObserver(() => {

debouncedScroll();

});

observer.observe(el, { childList: true, subtree: true, characterData: true })

return () => {

observer.disconnect();

if (debounceTimer) clearTimeout(debounceTimer);

}

Copilot · 2025-11-11T07:44:13Z

src/copilot-chat/src/copilot_agent/prompts/classification/f4/classify.txt

@@ -0,0 +1,99 @@
+Your task is to contextualize the user's question, and understand the user's intention from the question they asked, and assign a object and a concern as an indicator of the type of the question.
+


Spelling error: "Trainig" should be "Training".

Copilot · 2025-11-11T07:44:13Z

src/copilot-chat/src/copilot_agent/utils/llmsession.py

+        if cls._config_cache is None:
+            with cls._config_lock:
+                if cls._config_cache is None:  # Double-check pattern
+                    cls._config_cache = {
+                        'provider': os.environ.get("COPILOT_LLM_PROVIDER"),
+                        'azure_api_key': os.environ.get("AZURE_OPENAI_API_KEY"),
+                        'openai_api_key': os.environ.get("OPENAI_API_KEY"),
+                        'endpoint': os.environ.get("COPILOT_LLM_ENDPOINT"),
+                        'model_name': os.environ.get("COPILOT_LLM_MODEL"),
+                        'model_version': os.environ.get("COPILOT_LLM_VERSION"),
+                        'embedding_model_name': os.environ.get("COPILOT_EMBEDDING_MODEL")
+                    }
+                    logger.info(f'COPILOT LLM Endpoint Provider: {cls._config_cache["provider"]}')
+        return cls._config_cache


The double-checked locking pattern for _config_cache is implemented, but the cache dictionary itself is mutable and contains configuration values read from environment variables. If environment variables change during runtime (though rare), cached values will be stale. Consider documenting this assumption or adding a cache invalidation mechanism.

…ation

…on results

quge009 and others added 20 commits October 27, 2025 15:40

add: stream output

d79b3c5

change: use frontend to assign turnId

4278f23

modify: state update help message

7e01649

improve: auto scroll

434d8ec

adding more status update message

b68c194

minor: add new status update message

1adf126

Add feature: multi user concurrency

92d2869

fix: minor bug for non-streaming api

a0c50a0

improve user experience: post unauthorized access information to unau…

f4c3f1b

…thorized users

code cleanup: remove unnecessary llmsession instances

7776476

code refactor: stage 0

1e1285c

code refactor: stage 1

99c280a

code refactor: stage 2

7f989e7

code refactor: stage 3, use the same llmsession for each conversation…

856f861

…, pre-load prompts for classifier, contextualizer

code refactor: stage 4, change ltp into a class for easier state mana…

9f8a8b0

…gement

code refactor: stage 5, fix chuck accumulation bug

798d738

code refactor: stage 6, smart help into a class

173a9b2

minor bug fix

01384ec

improve: response latency by merge small llm chat calls into one clas…

16f4c63

…sification session

improve: response latency, by reuse llmsession for requests from the …

27d8f64

…same username

quge009 temporarily deployed to auto-test October 27, 2025 22:54 — with GitHub Actions Inactive

quge009 temporarily deployed to auto-test October 27, 2025 23:44 — with GitHub Actions Inactive

fix bug: missing import, changed prompt file name

138ba06

quge009 had a problem deploying to auto-test October 28, 2025 00:22 — with GitHub Actions Failure

quge009 temporarily deployed to auto-test October 28, 2025 19:11 — with GitHub Actions Inactive

quge009 temporarily deployed to auto-test October 28, 2025 19:21 — with GitHub Actions Inactive

quge009 changed the title ~~tmp~~ Improve Performance: CoPilot, response latency, user expectation Oct 28, 2025

quge009 changed the title ~~Improve Performance: CoPilot, response latency, user expectation~~ Improve Performance: CoPilot: response latency, user expectation Oct 28, 2025

quge009 changed the title ~~Improve Performance: CoPilot: response latency, user expectation~~ Improve Performance: CoPilot: users' perceived response latency Oct 28, 2025

quge009 marked this pull request as ready for review October 28, 2025 20:04

quge009 changed the title ~~Improve Performance: CoPilot: users' perceived response latency~~ Improve Performance: CoPilot: users experience Oct 28, 2025

improve: robustness, gracefully handling if classification fail

34b95ae

quge009 temporarily deployed to auto-test October 28, 2025 21:46 — with GitHub Actions Inactive

quge009 had a problem deploying to auto-test October 28, 2025 22:18 — with GitHub Actions Failure

quge009 had a problem deploying to auto-test October 28, 2025 22:20 — with GitHub Actions Failure

quge009 had a problem deploying to auto-test October 28, 2025 22:24 — with GitHub Actions Failure

change classifier version for deployment

cb34b54

quge009 temporarily deployed to auto-test October 28, 2025 23:34 — with GitHub Actions Inactive

quge009 had a problem deploying to auto-test October 28, 2025 23:38 — with GitHub Actions Failure

quge009 temporarily deployed to auto-test October 29, 2025 18:33 — with GitHub Actions Inactive

hippogr reviewed Nov 7, 2025

View reviewed changes

hippogr requested review from Copilot and hippogr November 11, 2025 07:34

Copilot started reviewing on behalf of hippogr November 11, 2025 07:34 View session

Copilot finished reviewing on behalf of hippogr November 11, 2025 07:39

Copilot AI reviewed Nov 11, 2025

View reviewed changes

resolve comment: change class name for clarity

09f0d81

quge009 temporarily deployed to auto-test November 12, 2025 20:59 — with GitHub Actions Inactive

quge009 temporarily deployed to auto-test November 12, 2025 21:11 — with GitHub Actions Inactive

resolve comment: 1.handle exceptions when necessary, 2.improve logging

e9c88f8

quge009 temporarily deployed to auto-test November 12, 2025 21:57 — with GitHub Actions Inactive

resolve comment: edit job metadata query to remove return count limit…

02e7e78

…ation

quge009 temporarily deployed to auto-test November 12, 2025 22:05 — with GitHub Actions Inactive

quge009 temporarily deployed to auto-test November 12, 2025 22:13 — with GitHub Actions Inactive

resolve comment: rewrite logger module using logging instead of print

06c653a

quge009 temporarily deployed to auto-test November 12, 2025 22:18 — with GitHub Actions Inactive

quge009 temporarily deployed to auto-test November 12, 2025 22:26 — with GitHub Actions Inactive

resolve comments: replace magic numbers with consts, for classificati…

3b1922b

…on results

quge009 temporarily deployed to auto-test November 12, 2025 23:00 — with GitHub Actions Inactive

quge009 deployed to auto-test November 12, 2025 23:06 — with GitHub Actions Active

		user_id = data['data']['messageInfo']['userId']
		conv_id = data['data']['messageInfo']['convId']

-            user_id = data['data']['messageInfo']['userId']
-            conv_id = data['data']['messageInfo']['convId']
+            # Validate required keys
+            if not data or 'data' not in data or 'messageInfo' not in data['data']:
+                return jsonify({"status": "error", "message": "Missing required fields: data.messageInfo"}), 400
+            message_info = data['data']['messageInfo']
+            user_id = message_info.get('userId')
+            conv_id = message_info.get('convId')
+            if not user_id or not conv_id:
+                return jsonify({"status": "error", "message": "Missing required fields: userId or convId"}), 400

		@@ -0,0 +1,99 @@
		Your task is to contextualize the user's question, and understand the user's intention from the question they asked, and assign a object and a concern as an indicator of the type of the question.

Improve Performance: CoPilot: users experience #110

Are you sure you want to change the base?

Improve Performance: CoPilot: users experience #110

Uh oh!

Conversation

quge009 commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR is mainly about improving the user experience.

Effectiveness of this PR:

Uh oh!

hippogr Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hippogr Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

quge009 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hippogr Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

quge009 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

quge009 commented Oct 27, 2025 •

edited

Loading

quge009 Nov 12, 2025 •

edited

Loading