-
Notifications
You must be signed in to change notification settings - Fork 1
Improve Performance: CoPilot: users experience #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…, pre-load prompts for classifier, contextualizer
…sification session
|
|
||
| // Process all complete SSE messages in buffer | ||
| let sepIndex; | ||
| while ((sepIndex = buffer.indexOf('\n\n')) !== -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this will cause infinite loop here when buffer.indexOf('\n\n') !== -1, please make sure that the loop can be breaked from the loop no matter which parts will be executed
| proxy_send_timeout 2m; | ||
| } | ||
|
|
||
| location ~ ^/copilot/api/stream(.*)$ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to remove the original part above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep both apis, reason:
- send a packet to /copilot/api/operation will response the whole response object, which includes step-wise intermediate results, used for accuracy evaluation during development
- send a packet to /copilot/api/stream is the official way for a user to interact with the service, which response only the final answer string at lower latency
| try: | ||
| llm_session.clear_instance_stream_callback() | ||
| except Exception: | ||
| logger.debug('Failed to clear instance stream callback') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if exception happens here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not quite get it, if clear_instance_stream_callback fails a log message will be logged.
Or are you suggesting q.put(None) should be wrapped by another try..except block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 28 out of 30 changed files in this pull request and generated 17 comments.
Files not reviewed (1)
- contrib/copilot-plugin/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)
src/copilot-chat/src/copilot_agent/utils/dcw.py:123
- Call to function gen_dcw with too few arguments; should be no fewer than 3.
full_dcw = gen_dcw(user_prompt, map_existing)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| // Create assistant placeholder and attach the same messageInfo (turnId) so feedback maps to this response | ||
| useChatStore.getState().addChat({ role: "assistant", message: "", timestamp: new Date(), messageInfo }); |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A placeholder assistant message is created before the server response arrives. If the fetch fails early (line 78), the empty placeholder message remains in the chat history, creating a confusing user experience. The catch block at line 166 doesn't remove this placeholder. Consider removing the placeholder on error or marking it as failed.
| try: | ||
| cb = self._instance_stream_callback or LLMSession._global_stream_callback | ||
| if cb: | ||
| cb(full) |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the stream_chat method, when accumulating chunks, the code yields individual chunks but then calls the callback with the full accumulated text (cb(full)). This means the callback receives increasingly larger strings containing all previous content, which could cause performance issues with large responses. The callback should either receive only the new chunk or this accumulation pattern should be clearly documented as intended behavior for replace-style streaming.
| cb(full) | |
| cb(chunk) |
| user_id = data['data']['messageInfo']['userId'] | ||
| conv_id = data['data']['messageInfo']['convId'] |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct access to nested dictionary keys without validation could raise KeyError if the client sends malformed JSON. The code should validate that 'data', 'messageInfo', 'userId', and 'convId' exist before accessing them, or use .get() with appropriate error handling.
| user_id = data['data']['messageInfo']['userId'] | |
| conv_id = data['data']['messageInfo']['convId'] | |
| # Validate required keys | |
| if not data or 'data' not in data or 'messageInfo' not in data['data']: | |
| return jsonify({"status": "error", "message": "Missing required fields: data.messageInfo"}), 400 | |
| message_info = data['data']['messageInfo'] | |
| user_id = message_info.get('userId') | |
| conv_id = message_info.get('convId') | |
| if not user_id or not conv_id: | |
| return jsonify({"status": "error", "message": "Missing required fields: userId or convId"}), 400 |
| 'userId': user_id, | ||
| 'convId': conv_id, | ||
| 'turnId': str(uuid.uuid4()), | ||
| 'turnId': turn_id, |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The turnId from question_msg_info is being assigned directly to the response without validation. If the frontend sends an invalid or missing turnId, this could cause issues. The code should validate or generate a fallback turnId if the incoming value is None or empty.
| if not AGENT_MINIMAL_ON: | ||
| # ingest kusto table | ||
| self.collect_data_to_kusto(log_data) |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging and Kusto data collection is now conditionally executed based on AGENT_MINIMAL_ON, but there's no corresponding change to handle cleanup or error scenarios when this flag is enabled. Consider documenting this behavior or ensuring that critical debugging information is still captured in minimal mode.
| for snapshot in self.stream_chat(system_prompt, user_prompt): | ||
| if snapshot: | ||
| last = snapshot | ||
| return last |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try_stream_fallback_chat method returns last which is initialized to an empty string but only updated inside the loop when snapshot is truthy. If the generator yields only empty strings or None values, the method will return an empty string instead of falling back to the non-streaming chat() method. The logic should check if any meaningful content was received and fall back if not.
| for snapshot in self.stream_chat(system_prompt, user_prompt): | |
| if snapshot: | |
| last = snapshot | |
| return last | |
| received_content = False | |
| for snapshot in self.stream_chat(system_prompt, user_prompt): | |
| if snapshot: | |
| last = snapshot | |
| received_content = True | |
| if received_content: | |
| return last | |
| else: | |
| logger.info('LLMSession: streaming yielded no content, falling back to non-streaming chat') | |
| return self.chat(system_prompt, user_prompt) |
| # push to frontend | ||
| push_frontend_event(reference) |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KQL query reference is being appended to the summary using push_frontend_event(reference) after the summary has already been generated and potentially streamed. This means the reference might be sent as a separate message or lost depending on timing. The reference should either be included in the summary generation or the streaming design should ensure proper ordering.
| # push to frontend | |
| push_frontend_event(reference) | |
| # Append reference to summary so it is returned together | |
| summary = summary + reference |
| const observer = new MutationObserver((mutations) => { | ||
| try { | ||
| const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight | ||
| if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) { | ||
| requestAnimationFrame(() => { | ||
| try { | ||
| scrollToBottom(el) | ||
| } catch (e) { | ||
| // ignore | ||
| } | ||
| }) | ||
| } | ||
| } catch (e) { | ||
| // ignore | ||
| } | ||
| }) | ||
|
|
||
| observer.observe(el, { childList: true, subtree: true, characterData: true }) | ||
| return () => observer.disconnect() |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MutationObserver watches for all mutations in the entire subtree with { childList: true, subtree: true, characterData: true }. This could cause significant performance overhead during rapid streaming updates as every character change triggers the observer callback. Consider throttling/debouncing the scroll trigger or observing only specific elements.
| const observer = new MutationObserver((mutations) => { | |
| try { | |
| const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight | |
| if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) { | |
| requestAnimationFrame(() => { | |
| try { | |
| scrollToBottom(el) | |
| } catch (e) { | |
| // ignore | |
| } | |
| }) | |
| } | |
| } catch (e) { | |
| // ignore | |
| } | |
| }) | |
| observer.observe(el, { childList: true, subtree: true, characterData: true }) | |
| return () => observer.disconnect() | |
| // Debounce scroll trigger to avoid excessive calls during rapid updates | |
| let debounceTimer: ReturnType<typeof setTimeout> | null = null; | |
| const DEBOUNCE_DELAY = 100; // ms | |
| const debouncedScroll = () => { | |
| if (debounceTimer) clearTimeout(debounceTimer); | |
| debounceTimer = setTimeout(() => { | |
| try { | |
| const distanceFromBottom = el.scrollHeight - el.scrollTop - el.clientHeight | |
| if (distanceFromBottom < NEAR_BOTTOM_THRESHOLD) { | |
| requestAnimationFrame(() => { | |
| try { | |
| scrollToBottom(el) | |
| } catch (e) { | |
| // ignore | |
| } | |
| }) | |
| } | |
| } catch (e) { | |
| // ignore | |
| } | |
| }, DEBOUNCE_DELAY); | |
| }; | |
| const observer = new MutationObserver(() => { | |
| debouncedScroll(); | |
| }); | |
| observer.observe(el, { childList: true, subtree: true, characterData: true }) | |
| return () => { | |
| observer.disconnect(); | |
| if (debounceTimer) clearTimeout(debounceTimer); | |
| } |
| @@ -0,0 +1,99 @@ | |||
| Your task is to contextualize the user's question, and understand the user's intention from the question they asked, and assign a object and a concern as an indicator of the type of the question. | |||
|
|
|||
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "Trainig" should be "Training".
| if cls._config_cache is None: | ||
| with cls._config_lock: | ||
| if cls._config_cache is None: # Double-check pattern | ||
| cls._config_cache = { | ||
| 'provider': os.environ.get("COPILOT_LLM_PROVIDER"), | ||
| 'azure_api_key': os.environ.get("AZURE_OPENAI_API_KEY"), | ||
| 'openai_api_key': os.environ.get("OPENAI_API_KEY"), | ||
| 'endpoint': os.environ.get("COPILOT_LLM_ENDPOINT"), | ||
| 'model_name': os.environ.get("COPILOT_LLM_MODEL"), | ||
| 'model_version': os.environ.get("COPILOT_LLM_VERSION"), | ||
| 'embedding_model_name': os.environ.get("COPILOT_EMBEDDING_MODEL") | ||
| } | ||
| logger.info(f'COPILOT LLM Endpoint Provider: {cls._config_cache["provider"]}') | ||
| return cls._config_cache |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-checked locking pattern for _config_cache is implemented, but the cache dictionary itself is mutable and contains configuration values read from environment variables. If environment variables change during runtime (though rare), cached values will be stale. Consider documenting this assumption or adding a cache invalidation mechanism.
This PR is mainly about improving the user experience.
Effectiveness of this PR: