-
Notifications
You must be signed in to change notification settings - Fork 11
new url cleaner feature #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a URL cleaner feature that removes tracking parameters from URLs shared in Telegram messages. The implementation includes a configurable rule-based system for removing tracking parameters (UTM, Facebook, Reddit, etc.) and automatically responds with cleaned URLs when users share links.
Key Changes
- New URL cleaning service with regex-based parameter removal for various platforms
- Extensive tracking parameter rules covering UTM, social media platforms, and analytics tools
- Integration with message handler to automatically clean URLs from messages
- Test coverage for various URL cleaning scenarios
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
| BotNet.Services/CleanUrl/UrlCleaner.cs | Core URL cleaning logic using regex-based parameter removal |
| BotNet.Services/CleanUrl/Rule.cs | Comprehensive rule definitions for tracking parameters across platforms |
| BotNet.Tests/Services/CleanUrl/UrlCleanerTests.cs | Test coverage for URL cleaning with various real-world examples |
| BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs | Integration of URL cleaning into message processing flow |
Comments suppressed due to low confidence (3)
BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:128
- Violates the project's Background Task Pattern (guideline 1000000). The code uses
Task.Run()for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should useBackgroundTask.Run()instead to ensure consistent exception handling, automatic logging, and metrics tracking. ReplaceTask _ = Task.Run(async () => {withBackgroundTask.Run(async () => {and inject/use the logger parameter.
Task _ = Task.Run(async () => {
try {
await _telegramBotClient.SendTextMessageAsync(
chatId: update.Message.Chat.Id,
text: $"Mirror: https://libreddit.teknologiumum.com/{remainingUrl}",
replyToMessageId: update.Message.MessageId,
disableWebPagePreview: true,
cancellationToken: cancellationToken
);
} catch (OperationCanceledException) {
// Terminate gracefully
}
});
BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:148
- Violates the project's Background Task Pattern (guideline 1000000). The code uses
Task.Run()for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should useBackgroundTask.Run()instead to ensure consistent exception handling, automatic logging, and metrics tracking. ReplaceTask _ = Task.Run(async () => {withBackgroundTask.Run(async () => {and inject/use the logger parameter.
Task _ = Task.Run(async () => {
try {
await _telegramBotClient.SendTextMessageAsync(
chatId: update.Message.Chat.Id,
text: $"Mirror: https://libreddit.teknologiumum.com/{remainingTextUrl}",
replyToMessageId: update.Message.MessageId,
disableWebPagePreview: true,
cancellationToken: cancellationToken
);
} catch (OperationCanceledException) {
// Terminate gracefully
}
});
BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:64
- This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
foreach (Uri url in possibleSocialUrls) {
Uri cleanedUrl = UrlCleaner.Clean(url);
Uri fixedUrl = SocialLinkEmbedFixer.Fix(cleanedUrl);
await _telegramBotClient.SendTextMessageAsync(
chatId: update.Message.Chat.Id,
text: $"Preview: {fixedUrl.OriginalString}",
replyToMessageId: update.Message.MessageId,
cancellationToken: cancellationToken
);
}
| "piwik_keyword", "piwik_kwd", "piwik_content", "piwik_cid", | ||
| // Google Ads | ||
| "gclid", "ga_source", "ga_medium", "ga_term", "ga_content", "ga_campaign", | ||
| "ga_place", "gclid", "gclsrc", |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate tracking parameter: gclid appears twice in the list (lines 44 and 45). Remove the duplicate entry.
| "ga_place", "gclid", "gclsrc", | |
| "ga_place", "gclsrc", |
| new Rule | ||
| { | ||
| Name = "Global", | ||
| Match = new Regex("./*"), |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern ./* is incorrect and will match almost nothing. The / after . means "followed by zero or more slashes". You likely want .* (any character, zero or more times) to match all URLs. This would cause the Global rule to never match any URLs, making all global tracking parameters (utm_*, fbclid, etc.) not get removed.
| Match = new Regex("./*"), | |
| Match = new Regex(".*"), |
| // Terminate gracefully | ||
| } | ||
| }); | ||
| return; |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new URL cleaning behavior (lines 72-102) causes the function to return early after processing URLs, preventing reddit mirroring logic (lines 106-150) from executing when both URLs and reddit links are present in the message. This changes existing behavior - if a message contains both a regular URL and a reddit link, only the URL cleaning will be triggered and reddit mirroring will be skipped. Consider removing the return statement on line 101 or restructuring the logic to allow both features to run.
| return; | |
| // (Removed return to allow reddit mirroring logic to run) |
| foreach (Rule rule in RuleData.Rules) { | ||
| if (rule.Match.IsMatch(url.ToString())) { | ||
| foreach (string r in rule.Rules) { | ||
| url = new Uri(Regex.Replace(url.ToString(), $"[&?]({r})=?[^&]*", "")); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The regex pattern $"[&?]({r})=?[^&]*" can create invalid URLs. When a query parameter is removed, it can leave orphaned & characters or the URL can end up with ?& sequences. For example, if you have ?param1=value&utm_source=x¶m2=value and remove utm_source, you'd get ?param1=value&¶m2=value which is invalid. The TrimEnd on line 21 only fixes trailing characters, not mid-URL issues. Consider using a proper URL parsing library or a more sophisticated regex that handles these edge cases.
| // Remove trailing '?' or '&' if present | ||
| string cleanedUrl = url.ToString().TrimEnd('?', '&').TrimEnd('/'); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The TrimEnd('/') on line 21 removes the trailing slash from all URLs, which can change the semantic meaning of the URL. For many web servers, /path/ and /path are different resources. This is particularly problematic for paths that intentionally end with a slash (e.g., directory listings). Consider only trimming the trailing slash if it's followed by an empty query string that was just removed, or make this behavior configurable.
| // Remove trailing '?' or '&' if present | |
| string cleanedUrl = url.ToString().TrimEnd('?', '&').TrimEnd('/'); | |
| // Remove trailing '?' or '&' if present, and only remove trailing slash if it was after a '?' or '&' | |
| string originalUrlString = url.ToString(); | |
| string cleanedUrl = originalUrlString.TrimEnd('?', '&'); | |
| // Only remove trailing slash if the original URL ended with "/?" or "/&" | |
| if ((originalUrlString.EndsWith("/?") || originalUrlString.EndsWith("/&")) && cleanedUrl.EndsWith("/")) | |
| { | |
| cleanedUrl = cleanedUrl.TrimEnd('/'); | |
| } |
| Task _ = Task.Run(async () => { | ||
| try { | ||
| foreach (string url in urls) { | ||
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | ||
|
|
||
| // if the url is same, don't send the message | ||
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { | ||
| continue; | ||
| } | ||
|
|
||
| await _telegramBotClient.SendTextMessageAsync( | ||
| chatId: update.Message.Chat.Id, | ||
| text: $"Cleaned URL: {cleanedUrl.OriginalString}", | ||
| replyToMessageId: update.Message.MessageId, | ||
| cancellationToken: cancellationToken | ||
| ); | ||
| } | ||
| } catch (OperationCanceledException) { | ||
| // Terminate gracefully | ||
| } | ||
| }); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Violates the project's Background Task Pattern (guideline 1000000). The code uses Task.Run() for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should use BackgroundTask.Run() instead to ensure consistent exception handling, automatic logging, and metrics tracking. Replace Task _ = Task.Run(async () => { with BackgroundTask.Run(async () => { and inject/use the logger parameter.
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | ||
|
|
||
| // if the url is same, don't send the message | ||
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Potential UriFormatException when URLs start with www. without protocol. The regex on line 73 matches URLs that start with www. but don't have http:// or https:// prefix. Line 83 then tries to create a Uri object with new Uri(url), which will throw a UriFormatException for URLs like www.example.com. Add protocol prefix if missing, or filter out such matches, or use new Uri(url, UriKind.RelativeOrAbsolute) with proper handling.
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | |
| // if the url is same, don't send the message | |
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { | |
| string urlToUse = url; | |
| if (urlToUse.StartsWith("www.", StringComparison.OrdinalIgnoreCase)) { | |
| urlToUse = "http://" + urlToUse; | |
| } | |
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(urlToUse)); | |
| // if the url is same, don't send the message | |
| if (cleanedUrl.OriginalString == new Uri(urlToUse).OriginalString) { |
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | ||
|
|
||
| // if the url is same, don't send the message | ||
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficiency: Creating new Uri(url) twice unnecessarily (lines 83 and 86). The Uri object is created once to clean the URL and again to compare the original string. Store the original Uri in a variable before cleaning and compare with that. This avoids redundant parsing and allocation.
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | |
| // if the url is same, don't send the message | |
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { | |
| Uri originalUri = new Uri(url); | |
| Uri cleanedUrl = UrlCleaner.Clean(originalUri); | |
| // if the url is same, don't send the message | |
| if (cleanedUrl.OriginalString == originalUri.OriginalString) { |
| foreach (Rule rule in RuleData.Rules) { | ||
| if (rule.Match.IsMatch(url.ToString())) { | ||
| foreach (string r in rule.Rules) { | ||
| url = new Uri(Regex.Replace(url.ToString(), $"[&?]({r})=?[^&]*", "")); | ||
| } | ||
| } | ||
| } |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.
| Task _ = Task.Run(async () => { | ||
| try { | ||
| foreach (string url in urls) { | ||
| Uri cleanedUrl = UrlCleaner.Clean(new Uri(url)); | ||
|
|
||
| // if the url is same, don't send the message | ||
| if (cleanedUrl.OriginalString == new Uri(url).OriginalString) { | ||
| continue; | ||
| } | ||
|
|
||
| await _telegramBotClient.SendTextMessageAsync( | ||
| chatId: update.Message.Chat.Id, | ||
| text: $"Cleaned URL: {cleanedUrl.OriginalString}", | ||
| replyToMessageId: update.Message.MessageId, | ||
| cancellationToken: cancellationToken | ||
| ); | ||
| } | ||
| } catch (OperationCanceledException) { | ||
| // Terminate gracefully | ||
| } | ||
| }); |
Copilot
AI
Nov 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assignment to _ is useless, since its value is never read.
It will remove any tracking from urls. It is also customizable, just add the rules from
Rule.cs.