-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[Fix](ai) Fix _exec_plan_fragment_impl meet unknown error when call AI_Functions #58363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
zclllyybb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
问题关键在于_exec_plan_fragment_in_pthread这里为啥没命中
catch (const Exception& e) {
st = e.to_status();
这块代码。理论上内部return和throw都预期能正常传递完整错误信息,不需要强行统一。
FE UT Coverage ReportIncrement line coverage |
5501b2a to
2b9138d
Compare
2b9138d to
d2ff9e7
Compare
|
run buildall |
TPC-H: Total hot run time: 34267 ms |
TPC-DS: Total hot run time: 182586 ms |
ClickBench: Total hot run time: 27.3 s |
FE UT Coverage ReportIncrement line coverage |
e01fde7 to
734f0ed
Compare
|
run buildall |
zclllyybb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 34378 ms |
TPC-DS: Total hot run time: 182087 ms |
ClickBench: Total hot run time: 27.69 s |
FE Regression Coverage ReportIncrement line coverage |
|
skip check_coverage |
…I_Functions (apache#58363) Issue Number: close #xxx Related PR: #xxx Problem Summary: When a query statement contains some Commands(e.g. `UPDATE`), it will cause the AI function call to not go through `NereidsCoordinator` and fallback to `Coordinator`. In this case, the FE will not send `AI_Resources` to the BE, which will lead to errors in subsequent queries and the error messages will not be clear. This pr also replace every directly `throw Status` with `throw Exception(Status...), so the errors can be surfaced as `Exception`, not raw `Status` ```text I20251114 18:00:45.502351 59053 fragment_mgr.cpp:716] query_id: 5c963987bf8340bc-a56b019c8b0b3300, coord_addr: TNetworkAddress(hostname=172.17.6.136, port=9020), total fragment num on current host: 1, fe process uuid: 1763114220687, query type: SELECT, report audit fe:TNetworkAddress(hostname=172.17.6.136, port=9020), use wg:1763112792749,normal W20251114 18:00:45.528087 59053 status.h:438] meet error status: [INTERNAL_ERROR]AI resources not found 0# doris::vectorized::AIFunction<doris::vectorized::FunctionAITranslate>::_init_from_resource(doris::FunctionContext*, doris::vectorized::Block const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::TAIResource&, std::shared_ptr<doris::vectorized::AIAdapter>&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/query_context.h:268 1# doris::vectorized::AIFunction<doris::vectorized::FunctionAITranslate>::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long) const at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 2# non-virtual thunk to doris::vectorized::AIFunction<doris::vectorized::FunctionAITranslate>::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/functions/ai/ai_functions.h:0 3# doris::vectorized::PreparedFunctionImpl::default_implementation_for_constant_arguments(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long, bool, bool*) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/common/cow.h:0 4# doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long, bool) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/functions/function.cpp:0 5# doris::vectorized::PreparedFunctionImpl::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long, bool) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/functions/function.cpp:249 6# doris::vectorized::IFunctionBase::execute(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned int, std::allocator<unsigned int> > const&, unsigned int, unsigned long, bool) const at /home/zcp/repo_center/doris_release/doris/be/src/vec/functions/function.h:192 7# doris::vectorized::VectorizedFnCall::_do_execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*, std::vector<unsigned int, std::allocator<unsigned int> >&) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exprs/vectorized_fn_call.cpp:238 8# doris::vectorized::VectorizedFnCall::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_vector.h:375 9# doris::vectorized::VExpr::get_const_col(doris::vectorized::VExprContext*, std::shared_ptr<doris::ColumnPtrWrapper>*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 10# doris::vectorized::VectorizedFnCall::open(doris::RuntimeState*, doris::vectorized::VExprContext*, doris::FunctionContext::FunctionStateScope) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 11# doris::vectorized::VExprContext::open(doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exprs/vexpr_context.cpp:0 12# doris::vectorized::VExpr::open(std::vector<std::shared_ptr<doris::vectorized::VExprContext>, std::allocator<std::shared_ptr<doris::vectorized::VExprContext> > > const&, doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 13# doris::pipeline::UnionSourceOperatorX::prepare(doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 14# doris::pipeline::Pipeline::prepare(doris::RuntimeState*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 15# doris::pipeline::PipelineFragmentContext::prepare(doris::ThreadPool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 16# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, std::function<void (doris::RuntimeState*, doris::Status*)> const&, doris::TPipelineFragmentParamsList const&) at /home/zcp/repo_center/doris_release/doris/be/src/runtime/fragment_mgr.cpp:0 17# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, doris::TPipelineFragmentParamsList const&) at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/std_function.h:245 18# doris::PInternalService::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:524 19# doris::PInternalService::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0 20# doris::WorkThreadPool<false>::work_thread(int) at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/atomic_base.h:641 21# execute_native_thread_routine 22# start_thread 23# clone I20251114 18:00:45.528275 59053 pipeline_fragment_context.cpp:139] PipelineFragmentContext::~PipelineFragmentContext|query_id=5c963987bf8340bc-a56b019c8b0b3300|fragment_id=0 I20251114 18:00:45.528398 59053 query_context.cpp:240] Query 5c963987bf8340bc-a56b019c8b0b3300 deconstructed, mem_tracker: W20251114 18:00:45.531440 59053 status.h:456] meet error status: [INTERNAL_ERROR]_exec_plan_fragment_impl meet unknown error 0# doris::PInternalService::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) at /home/zcp/repo_center/doris_release/doris/be/src/service/internal_service.cpp:0 1# doris::WorkThreadPool<false>::work_thread(int) at /usr/local/ldb-toolchain-v0.26/bin/../lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/atomic_base.h:641 2# execute_native_thread_routine 3# start_thread 4# clone W20251114 18:00:45.531484 59053 internal_service.cpp:351] exec plan fragment failed, errmsg=[INTERNAL_ERROR]_exec_plan_fragment_impl meet unknown error ``` - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When a query statement contains some Commands(e.g.
UPDATE), it will cause the AI function call to not go throughNereidsCoordinatorand fallback toCoordinator. In this case, the FE will not sendAI_Resourcesto the BE, which will lead to errors in subsequent queries and the error messages will not be clear.This pr also replace every directly
throw Statuswiththrow Exception(Status...), so the errors can be surfaced asException, not rawStatus`Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)