Skip to content

Conversation

@quge009
Copy link
Collaborator

@quge009 quge009 commented Oct 27, 2025

This PR is mainly about improving the user experience.

  • Changes made to optimize users perceived latency and reading experience:
    • Implement streaming output for LLMSession class, change the final answer generation call to streaming output, to post the answer to user as-soon-as the first few tokens are ready.
    • Implement the push_frontend method to leverage the steaming output to feedback the CoPilot progress status message to user in real-time, to manage users' experience during waiting for the answer.
    • Add auto scroll feature for frontend plugin to enhance readability.
  • Changes made to reduce the average_response_latency (defined as time between question receival and answer posting):
    • Refactor several components' (SmartHelp, LTP, ...) implementation into classes, to make it possible to preserve the states when necessary.
    • Reuse the same llm_session instance for requests within the same conversation, by avoiding unnecessary https re-connection in initialization.
    • Implement a new question parsing function to combine contextualization and classification llm calls into one efficient call, to reduce time.
    • Move prompt reading to instance initialization, to avoid unnecessary file I/O operations.
  • Also a minor bug fix is included:
    • Change the assignment of 'turnId' to frontend.

Effectiveness of this PR:

  • Impact on accuracy
    • No change
  • Impact on response latency
    • ~15% response time reduction on average
    • ~50% response time reduction for extreme simple question

@quge009 quge009 changed the title tmp Improve Performance: CoPilot, response latency, user expectation Oct 28, 2025
@quge009 quge009 changed the title Improve Performance: CoPilot, response latency, user expectation Improve Performance: CoPilot: response latency, user expectation Oct 28, 2025
@quge009 quge009 changed the title Improve Performance: CoPilot: response latency, user expectation Improve Performance: CoPilot: users' perceived response latency Oct 28, 2025
@quge009 quge009 marked this pull request as ready for review October 28, 2025 20:04
@quge009 quge009 changed the title Improve Performance: CoPilot: users' perceived response latency Improve Performance: CoPilot: users experience Oct 28, 2025

// Process all complete SSE messages in buffer
let sepIndex;
while ((sepIndex = buffer.indexOf('\n\n')) !== -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will cause infinite loop here when buffer.indexOf('\n\n') !== -1, please make sure that the loop can be breaked from the loop no matter which parts will be executed

proxy_send_timeout 2m;
}

location ~ ^/copilot/api/stream(.*)$ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to remove the original part above?

Copy link
Collaborator Author

@quge009 quge009 Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep both apis, reason:

  • send a packet to /copilot/api/operation will response the whole response object, which includes step-wise intermediate results, used for accuracy evaluation during development
  • send a packet to /copilot/api/stream is the official way for a user to interact with the service, which response only the final answer string at lower latency

try:
llm_session.clear_instance_stream_callback()
except Exception:
logger.debug('Failed to clear instance stream callback')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if exception happens here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not quite get it, if clear_instance_stream_callback fails a log message will be logged.
Or are you suggesting q.put(None) should be wrapped by another try..except block?

@hippogr hippogr requested review from Copilot and hippogr November 11, 2025 07:34
Copilot finished reviewing on behalf of hippogr November 11, 2025 07:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 28 out of 30 changed files in this pull request and generated 17 comments.

Files not reviewed (1)
  • contrib/copilot-plugin/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)

src/copilot-chat/src/copilot_agent/utils/dcw.py:123

    full_dcw = gen_dcw(user_prompt, map_existing)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

};

// Create assistant placeholder and attach the same messageInfo (turnId) so feedback maps to this response
useChatStore.getState().addChat({ role: "assistant", message: "", timestamp: new Date(), messageInfo });
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A placeholder assistant message is created before the server response arrives. If the fetch fails early (line 78), the empty placeholder message remains in the chat history, creating a confusing user experience. The catch block at line 166 doesn't remove this placeholder. Consider removing the placeholder on error or marking it as failed.

Copilot uses AI. Check for mistakes.
try:
cb = self._instance_stream_callback or LLMSession._global_stream_callback
if cb:
cb(full)
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the stream_chat method, when accumulating chunks, the code yields individual chunks but then calls the callback with the full accumulated text (cb(full)). This means the callback receives increasingly larger strings containing all previous content, which could cause performance issues with large responses. The callback should either receive only the new chunk or this accumulation pattern should be clearly documented as intended behavior for replace-style streaming.

Suggested change
cb(full)
cb(chunk)

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +69
user_id = data['data']['messageInfo']['userId']
conv_id = data['data']['messageInfo']['convId']
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct access to nested dictionary keys without validation could raise KeyError if the client sends malformed JSON. The code should validate that 'data', 'messageInfo', 'userId', and 'convId' exist before accessing them, or use .get() with appropriate error handling.

Suggested change
user_id = data['data']['messageInfo']['userId']
conv_id = data['data']['messageInfo']['convId']
# Validate required keys
if not data or 'data' not in data or 'messageInfo' not in data['data']:
return jsonify({"status": "error", "message": "Missing required fields: data.messageInfo"}), 400
message_info = data['data']['messageInfo']
user_id = message_info.get('userId')
conv_id = message_info.get('convId')
if not user_id or not conv_id:
return jsonify({"status": "error", "message": "Missing required fields: userId or convId"}), 400

Copilot uses AI. Check for mistakes.
'userId': user_id,
'convId': conv_id,
'turnId': str(uuid.uuid4()),
'turnId': turn_id,
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The turnId from question_msg_info is being assigned directly to the response without validation. If the frontend sends an invalid or missing turnId, this could cause issues. The code should validate or generate a fallback turnId if the incoming value is None or empty.

Copilot uses AI. Check for mistakes.
Comment on lines +262 to +264
if not AGENT_MINIMAL_ON:
# ingest kusto table
self.collect_data_to_kusto(log_data)
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging and Kusto data collection is now conditionally executed based on AGENT_MINIMAL_ON, but there's no corresponding change to handle cleanup or error scenarios when this flag is enabled. Consider documenting this behavior or ensuring that critical debugging information is still captured in minimal mode.

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +285
for snapshot in self.stream_chat(system_prompt, user_prompt):
if snapshot:
last = snapshot
return last
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try_stream_fallback_chat method returns last which is initialized to an empty string but only updated inside the loop when snapshot is truthy. If the generator yields only empty strings or None values, the method will return an empty string instead of falling back to the non-streaming chat() method. The logic should check if any meaningful content was received and fall back if not.

Suggested change
for snapshot in self.stream_chat(system_prompt, user_prompt):
if snapshot:
last = snapshot
return last
received_content = False
for snapshot in self.stream_chat(system_prompt, user_prompt):
if snapshot:
last = snapshot
received_content = True
if received_content:
return last
else:
logger.info('LLMSession: streaming yielded no content, falling back to non-streaming chat')
return self.chat(system_prompt, user_prompt)

Copilot uses AI. Check for mistakes.
Comment on lines +190 to +191
# push to frontend
push_frontend_event(reference)
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KQL query reference is being appended to the summary using push_frontend_event(reference) after the summary has already been generated and potentially streamed. This means the reference might be sent as a separate message or lost depending on timing. The reference should either be included in the summary generation or the streaming design should ensure proper ordering.

Suggested change
# push to frontend
push_frontend_event(reference)
# Append reference to summary so it is returned together
summary = summary + reference

Copilot uses AI. Check for mistakes.
Comment on lines +379 to +397
const observer = new MutationObserver((mutations) => {
try {
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {
requestAnimationFrame(() => {
try {
scrollToBottom(el)
} catch (e) {
// ignore
}
})
}
} catch (e) {
// ignore
}
})

observer.observe(el, { childList: true, subtree: true, characterData: true })
return () => observer.disconnect()
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MutationObserver watches for all mutations in the entire subtree with { childList: true, subtree: true, characterData: true }. This could cause significant performance overhead during rapid streaming updates as every character change triggers the observer callback. Consider throttling/debouncing the scroll trigger or observing only specific elements.

Suggested change
const observer = new MutationObserver((mutations) => {
try {
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {
requestAnimationFrame(() => {
try {
scrollToBottom(el)
} catch (e) {
// ignore
}
})
}
} catch (e) {
// ignore
}
})
observer.observe(el, { childList: true, subtree: true, characterData: true })
return () => observer.disconnect()
// Debounce scroll trigger to avoid excessive calls during rapid updates
let debounceTimer: ReturnType<typeof setTimeout> | null = null;
const DEBOUNCE_DELAY = 100; // ms
const debouncedScroll = () => {
if (debounceTimer) clearTimeout(debounceTimer);
debounceTimer = setTimeout(() => {
try {
const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight
if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) {
requestAnimationFrame(() => {
try {
scrollToBottom(el)
} catch (e) {
// ignore
}
})
}
} catch (e) {
// ignore
}
}, DEBOUNCE_DELAY);
};
const observer = new MutationObserver(() => {
debouncedScroll();
});
observer.observe(el, { childList: true, subtree: true, characterData: true })
return () => {
observer.disconnect();
if (debounceTimer) clearTimeout(debounceTimer);
}

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
Your task is to contextualize the user's question, and understand the user's intention from the question they asked, and assign a object and a concern as an indicator of the type of the question.

Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "Trainig" should be "Training".

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +84
if cls._config_cache is None:
with cls._config_lock:
if cls._config_cache is None: # Double-check pattern
cls._config_cache = {
'provider': os.environ.get("COPILOT_LLM_PROVIDER"),
'azure_api_key': os.environ.get("AZURE_OPENAI_API_KEY"),
'openai_api_key': os.environ.get("OPENAI_API_KEY"),
'endpoint': os.environ.get("COPILOT_LLM_ENDPOINT"),
'model_name': os.environ.get("COPILOT_LLM_MODEL"),
'model_version': os.environ.get("COPILOT_LLM_VERSION"),
'embedding_model_name': os.environ.get("COPILOT_EMBEDDING_MODEL")
}
logger.info(f'COPILOT LLM Endpoint Provider: {cls._config_cache["provider"]}')
return cls._config_cache
Copy link

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double-checked locking pattern for _config_cache is implemented, but the cache dictionary itself is mutable and contains configuration values read from environment variables. If environment variables change during runtime (though rare), cached values will be stale. Consider documenting this assumption or adding a cache invalidation mechanism.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants