Skip to content

Conversation

@Meo597
Copy link
Collaborator

@Meo597 Meo597 commented Oct 18, 2025

  • TUN / TProxy 且 realIp 环境下很有用,可显著降低延迟
  • 修复了 @patterniha 所导致的 TTL 永远 +1s 的问题 再也不能续一秒了
  • 修复了 @patterniha 所导致的 46 一起查时如果 TTL < 46 响应间隔,返回空 IP 问题
  • 我拉了一坨大的
{
  // 示例配置: 过期时间不超过 48 小时的记录,先返回缓存再刷新
  "dns": {
    "servers": [
      {
        "address": "8.8.8.8",
        "serveStale": true, // 开启乐观缓存
        "serveExpiredTTL": 172800 // 默认为 nil 以继承自全局
      },
      "1.1.1.1"
    ],
    "serveStale": false, // 默认为 false 不开启
    "serveExpiredTTL": 3600 // 默认为 0 永不过期
  }
}

缓存重构(#5248)、乐观缓存(#5237)、并行查询(#5239)
三个 PR 相互冲突,如果全部接受得按这个顺序 merge
我好 rebase

另外,如果准备接受此 PR,但不接受缓存重构(#5248)的话
无法处理楼下 @j2rong4cn 提到的 BUG
因为我仅在 rebase #5248 的基础之上修复
实在是不想顶着恶心写过度冗余的代码

此 PR rebase 后的 diff
https://github.com/Meo597/Xray-core/compare/fix-p-dns-CacheCleanup..rebased-new-dns/optimistic-caching

@Fangliding
Copy link
Member

Fangliding commented Oct 18, 2025

个人感觉过度折腾这玩意除了增加这玩意没什么实际用途
有人觉得mux每个连接省两三个rtt不重要有人觉得核心启动几天后再一次访问这个网址多个rtt去查dns要死了

@Meo597
Copy link
Collaborator Author

Meo597 commented Oct 18, 2025

确实可以减少几十到数百毫秒的延迟,特别是一些 ttl 贼低的 cdn 比如 Akamai

我以前都是硬编码魔改着用的,没什么副作用
至于用不用得到见仁见智,反正默认是关的

刚看到 @SukkaW 发文说 xray 和 sing-box 都没有乐观缓存
忍不了一点,又不是啥高大上的玩意,我这么菜一个小时就糊完了

要不是因为太菜实在看不懂 sing-box 的配置文件,高低也得给他糊一个

@patterniha
Copy link
Collaborator

patterniha commented Oct 20, 2025

I intentionally added +1 to TTL(if 0<untilExpire<1), because the returned-IP-list in routing (when using "IPOnDemand"/"IPIfNonMatch") should be same as returned-IP-list in outbound (when targetStrategy is "ForcrIP"/'UseIP"), the reasons have already been discussed.

Anyway, this is a bit of an obsession, because the probability of TTL expire in outbound-use-of-built-in-dns (but not expired in routing-use-of-built-in-dns) is very low and it is nearly impossible (even adding 1ms (if 0<untilExpire<1ms) is enough).

Also it is still possible we have different IPs in routing and outbound (for example DNS1 is not available in routing and it use DNS2, but DNS1 is available for outbound)

The correct way is to save router-result-IPs in ctx, and use it in outbound if we have targetStrategy, instead of using built-in-DNS again.

As a result, you can remove it, np.

///

Also, cache_controller>updateIP function need some changes after allowing TTL to be negative.

_, _, err := rec.A.getIPs()
if !go_errors.Is(err, errRecordNotFound) {

and

_, _, err := rec.AAAA.getIPs()
if !go_errors.Is(err, errRecordNotFound) {

should change to:

 _, ttl, _ := rec.A.getIPs() 
 if ttl > 0 {
     ...
} 

@Meo597
Copy link
Collaborator Author

Meo597 commented Oct 20, 2025

以前你的版本遇到 ttl 问题是因为:

  1. cache controller 取 ip 没有 ceil,所以永远少 1
  2. 请求 a+aaaa,一旦两个响应间的延迟 > ttl,因为你把两个 ttl 取最小值,导致 ttl = 0 并且早到的 ip 消失
  3. 你为了修正 2 强制给 ttl + 1,但延迟一旦大于 1s 仍有 bug

至于你上面提到的问题,实际上并不存在,因为你没有理解这块逻辑

@patterniha
Copy link
Collaborator

suppose we don't want to use optimistic-cache and suppose A-ttl is expired but AAAA-ttl is not expired (before calling QueryIP), in this situation when we have A+AAAA request, the current code logic is:

the code request for both A and AAAA ips (you can check the code)
but suppose v6-new-result returned before v4-new-result, so after using negative-ttl if we use !go_errors.Is(err, errRecordNotFound) in the code that i mentioned, we will return new-AAAA-ips with outdated-A-ips, but we set optimistic-cache to false, and we shouldn't use outdated-IPs.

as a result, we should wait for new-v4-ips, because old-v4-IPs was outdated before calling QueryIP not during returning another IP-type.

///

instead of changing the code to ttl > 0, it is better to send only A-query if only A-ttl is expired and AAAA-ttl is not expired, and then remove completely these codes:

if !c.disableCache {
_, _, err := rec.A.getIPs()
if !go_errors.Is(err, errRecordNotFound) {
c.pub.Publish(req.domain+"4", nil)
}
}

and

if !c.disableCache {
_, _, err := rec.AAAA.getIPs()
if !go_errors.Is(err, errRecordNotFound) {
c.pub.Publish(req.domain+"6", nil)
}
}

///

In short, it is not important A-ttl expired during returning-new-AAAA-result, we can still use it even if optimistic-cache is false.

but if a record expired before calling QueryIP, we should not use that if optimistic-cache is false.

@patterniha patterniha force-pushed the feat-dns-optimistic-caching branch from d4276fb to 61f2f8b Compare October 20, 2025 17:06
@Meo597
Copy link
Collaborator Author

Meo597 commented Oct 20, 2025

大哥你想玩能不能 fork 一份在自己仓库魔改
你这乱改的连测试都跑不通,真的没问题吗?

@Meo597 Meo597 force-pushed the feat-dns-optimistic-caching branch from 61f2f8b to cd4f1cd Compare October 20, 2025 17:17
@patterniha
Copy link
Collaborator

patterniha commented Oct 20, 2025

some of my changes is related to your code and fix your code.

if i open new PR, it would be many conflicts-codes.

@patterniha
Copy link
Collaborator

before revert please read my code carefully, and if you have any question i can answer.

@patterniha
Copy link
Collaborator

i also fix tests.
ok?

@Meo597
Copy link
Collaborator Author

Meo597 commented Oct 20, 2025

首先你至少本地测试跑通了再提交,更何况这是别人的 PR

其次这份 PR 在我这里工作得很好,已经好几周了没发现什么问题

你上面讲的那一长串 “BUG” 我没理解
因为我还在 review 你所有的 PR

@patterniha
Copy link
Collaborator

patterniha commented Oct 20, 2025

it is impossible to open new PR, there are many nested-codes.

I took the time to read your code, please take the time to read mine too.

this is yourPR+patch please read it first: https://github.com/XTLS/Xray-core/tree/optimistic-dns-with-patch

///

in the patch i add these optimization:

  1. for a+aaaa request, if A-ttl is not expired but AAAA-ttl is expired, we should only send AAAA-query and there is no need to update A-record (and vice versa)

  2. sendQuery send each query in new goroutine so there is no need to run it in new goroutine.

I had noticed number-1 before and wanted to fix it, but I had forgotten.

///

if you don't understand why i did these, please read #5237 (comment)

///

Anyway this PR alone has this: #5237 (comment) problem and should not be merged.

@Meo597
Copy link
Collaborator Author

Meo597 commented Oct 21, 2025

你说的问题已经修复了,ttl > 0 完全足够,没必要搞那么一大坨 patch 进来

因为你的 patch 会导致高命中率本应积极更新的域名得不到主动更新
导致大幅增加 dns 查询阻塞时间

接下来我会 review 你的每一个 PR

@Meo597 Meo597 changed the title feat(dns): Implement optimistic caching feat(dns): add optimistic caching Oct 21, 2025
@j2rong4cn
Copy link
Contributor

j2rong4cn commented Nov 11, 2025

乐观缓存提前返回导致上下文被取消,算bug吗?
j2rong4cn@444e70d
j2rong4cn@e0c4b39

@j2rong4cn
Copy link
Contributor

j2rong4cn@e0c4b39 效果
Screenshot_2025-11-11-19-25-02-91_84d3000e3f4017145260f7618db1d683

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 11, 2025

@j2rong4cn 你这是 fork 一份干了个啥,我手机不方便看

乐观缓存提前返回不应该取消并发查询的 ctx

因为这会影响到预热效果
而且如果没预热成功会反复请求这些服务器

可以看下我之前修复的 commit

在并行查询请求那直接包一层把 ctx.cancel 传播链截断,同时不能影响 ctx 携带的 attr

@j2rong4cn
Copy link
Contributor

j2rong4cn commented Nov 11, 2025

乐观缓存提前返回不应该取消并发查询的 ctx

不是手动取消,是响应结束,ctx被取消,导致乐观缓存有效期内无法更新缓存

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 11, 2025

是的,就是这个意思
我之前在并行查询那个 pr 修复了这个问题
可以看下我的仓库里有 rebase 且 merge 过的成品

@j2rong4cn
Copy link
Contributor

是的,就是这个意思 我之前在并行查询那个 pr 修复了这个问题 可以看下我的仓库里有 rebase 且 merge 过的成品

我没合并 并行查询的pr,原来修复了😂

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 11, 2025

不过这提醒了我
之前我是仅针对并行查询修复了

乐观缓存这块也得处理
当普通缓存到期,但乐观缓存没到期的时候触发的异步刷新缓存也需要包一下
我只用 udp 没发现问题

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

@j2rong4cn
关于 DoH DoQ DoT 乐观缓存异步刷新会被取消的问题已经在 rebased 分支修复了
Meo597@99cf021

@j2rong4cn
Copy link
Contributor

@j2rong4cn 关于 DoH DoQ DoT 乐观缓存异步刷新会被取消的问题已经在 rebased 分支修复了 Meo597@99cf021

我觉这样改比较好 j2rong4cn@2baa33e

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

@j2rong4cn 我 golang 是散装的

我不理解为啥你那样包个 valueCtx 能实现截断 cancel 的目的

我看了它没有 override Deadline() Done() Err() 这些
理论上应该还受控制

@j2rong4cn
Copy link
Contributor

j2rong4cn commented Nov 13, 2025

@j2rong4cn 我 golang 是散装的

我不理解为啥你那样包个 valueCtx 能实现截断 cancel 的目的

我看了它没有 override Deadline() Done() Err() 这些 理论上应该还受控制

这是新的一版j2rong4cn@2baa33e 用的context.WithoutCancel
我合并了 #5248 #5237,我看到除了fakeip和local 都会调用queryIP函数,把Deadline集中在doFetch函数是没问题的
都是继承自下面这里设置的Deadline

ctx, cancel := context.WithTimeout(ctx, c.timeoutMs)

@Meo597
Copy link
Collaborator Author

Meo597 commented Nov 13, 2025

这是新的一版j2rong4cn@2baa33e 用的context.WithoutCancel

看到了,你这样的确比较好
顺便还能把底下各自的实现中 deadline 冗余代码给删了

功能上两种修复是等价的,我都 rebase 完了就这样吧先

@j2rong4cn
Copy link
Contributor

合并了 #5248 #5237,我看到除了fakeip和local 都会调用queryIP函数,把Deadline集中在doFetch函数是没问题的 都是继承自下面这里设置的Deadline

ctx, cancel := context.WithTimeout(ctx, c.timeoutMs)

j2rong4cn@591674b 更激进一点,只在刷新乐观缓存时重新设置Deadline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants