Skip to content

Conversation

@bgwastu
Copy link
Contributor

@bgwastu bgwastu commented Jan 13, 2024

It will remove any tracking from urls. It is also customizable, just add the rules from Rule.cs.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a URL cleaner feature that removes tracking parameters from URLs shared in Telegram messages. The implementation includes a configurable rule-based system for removing tracking parameters (UTM, Facebook, Reddit, etc.) and automatically responds with cleaned URLs when users share links.

Key Changes

  • New URL cleaning service with regex-based parameter removal for various platforms
  • Extensive tracking parameter rules covering UTM, social media platforms, and analytics tools
  • Integration with message handler to automatically clean URLs from messages
  • Test coverage for various URL cleaning scenarios

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File Description
BotNet.Services/CleanUrl/UrlCleaner.cs Core URL cleaning logic using regex-based parameter removal
BotNet.Services/CleanUrl/Rule.cs Comprehensive rule definitions for tracking parameters across platforms
BotNet.Tests/Services/CleanUrl/UrlCleanerTests.cs Test coverage for URL cleaning with various real-world examples
BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs Integration of URL cleaning into message processing flow
Comments suppressed due to low confidence (3)

BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:128

  • Violates the project's Background Task Pattern (guideline 1000000). The code uses Task.Run() for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should use BackgroundTask.Run() instead to ensure consistent exception handling, automatic logging, and metrics tracking. Replace Task _ = Task.Run(async () => { with BackgroundTask.Run(async () => { and inject/use the logger parameter.
				Task _ = Task.Run(async () => {
					try {
						await _telegramBotClient.SendTextMessageAsync(
							chatId: update.Message.Chat.Id,
							text: $"Mirror: https://libreddit.teknologiumum.com/{remainingUrl}",
							replyToMessageId: update.Message.MessageId,
							disableWebPagePreview: true,
							cancellationToken: cancellationToken
						);
					} catch (OperationCanceledException) {
						// Terminate gracefully
					}
				});

BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:148

  • Violates the project's Background Task Pattern (guideline 1000000). The code uses Task.Run() for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should use BackgroundTask.Run() instead to ensure consistent exception handling, automatic logging, and metrics tracking. Replace Task _ = Task.Run(async () => { with BackgroundTask.Run(async () => { and inject/use the logger parameter.
				Task _ = Task.Run(async () => {
					try {
						await _telegramBotClient.SendTextMessageAsync(
							chatId: update.Message.Chat.Id,
							text: $"Mirror: https://libreddit.teknologiumum.com/{remainingTextUrl}",
							replyToMessageId: update.Message.MessageId,
							disableWebPagePreview: true,
							cancellationToken: cancellationToken
						);
					} catch (OperationCanceledException) {
						// Terminate gracefully
					}
				});

BotNet.CommandHandlers/BotUpdate/Message/MessageUpdateHandler.cs:64

							foreach (Uri url in possibleSocialUrls) {
								Uri cleanedUrl = UrlCleaner.Clean(url);
								Uri fixedUrl = SocialLinkEmbedFixer.Fix(cleanedUrl);
								await _telegramBotClient.SendTextMessageAsync(
									chatId: update.Message.Chat.Id,
									text: $"Preview: {fixedUrl.OriginalString}",
									replyToMessageId: update.Message.MessageId,
									cancellationToken: cancellationToken
								);
							}

"piwik_keyword", "piwik_kwd", "piwik_content", "piwik_cid",
// Google Ads
"gclid", "ga_source", "ga_medium", "ga_term", "ga_content", "ga_campaign",
"ga_place", "gclid", "gclsrc",
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate tracking parameter: gclid appears twice in the list (lines 44 and 45). Remove the duplicate entry.

Suggested change
"ga_place", "gclid", "gclsrc",
"ga_place", "gclsrc",

Copilot uses AI. Check for mistakes.
new Rule
{
Name = "Global",
Match = new Regex("./*"),
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern ./* is incorrect and will match almost nothing. The / after . means "followed by zero or more slashes". You likely want .* (any character, zero or more times) to match all URLs. This would cause the Global rule to never match any URLs, making all global tracking parameters (utm_*, fbclid, etc.) not get removed.

Suggested change
Match = new Regex("./*"),
Match = new Regex(".*"),

Copilot uses AI. Check for mistakes.
// Terminate gracefully
}
});
return;
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new URL cleaning behavior (lines 72-102) causes the function to return early after processing URLs, preventing reddit mirroring logic (lines 106-150) from executing when both URLs and reddit links are present in the message. This changes existing behavior - if a message contains both a regular URL and a reddit link, only the URL cleaning will be triggered and reddit mirroring will be skipped. Consider removing the return statement on line 101 or restructuring the logic to allow both features to run.

Suggested change
return;
// (Removed return to allow reddit mirroring logic to run)

Copilot uses AI. Check for mistakes.
foreach (Rule rule in RuleData.Rules) {
if (rule.Match.IsMatch(url.ToString())) {
foreach (string r in rule.Rules) {
url = new Uri(Regex.Replace(url.ToString(), $"[&?]({r})=?[^&]*", ""));
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The regex pattern $"[&?]({r})=?[^&]*" can create invalid URLs. When a query parameter is removed, it can leave orphaned & characters or the URL can end up with ?& sequences. For example, if you have ?param1=value&utm_source=x&param2=value and remove utm_source, you'd get ?param1=value&&param2=value which is invalid. The TrimEnd on line 21 only fixes trailing characters, not mid-URL issues. Consider using a proper URL parsing library or a more sophisticated regex that handles these edge cases.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
// Remove trailing '?' or '&' if present
string cleanedUrl = url.ToString().TrimEnd('?', '&').TrimEnd('/');
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The TrimEnd('/') on line 21 removes the trailing slash from all URLs, which can change the semantic meaning of the URL. For many web servers, /path/ and /path are different resources. This is particularly problematic for paths that intentionally end with a slash (e.g., directory listings). Consider only trimming the trailing slash if it's followed by an empty query string that was just removed, or make this behavior configurable.

Suggested change
// Remove trailing '?' or '&' if present
string cleanedUrl = url.ToString().TrimEnd('?', '&').TrimEnd('/');
// Remove trailing '?' or '&' if present, and only remove trailing slash if it was after a '?' or '&'
string originalUrlString = url.ToString();
string cleanedUrl = originalUrlString.TrimEnd('?', '&');
// Only remove trailing slash if the original URL ended with "/?" or "/&"
if ((originalUrlString.EndsWith("/?") || originalUrlString.EndsWith("/&")) && cleanedUrl.EndsWith("/"))
{
cleanedUrl = cleanedUrl.TrimEnd('/');
}

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +100
Task _ = Task.Run(async () => {
try {
foreach (string url in urls) {
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));

// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
continue;
}

await _telegramBotClient.SendTextMessageAsync(
chatId: update.Message.Chat.Id,
text: $"Cleaned URL: {cleanedUrl.OriginalString}",
replyToMessageId: update.Message.MessageId,
cancellationToken: cancellationToken
);
}
} catch (OperationCanceledException) {
// Terminate gracefully
}
});
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Violates the project's Background Task Pattern (guideline 1000000). The code uses Task.Run() for fire-and-forget operations, but according to the coding guidelines, all fire-and-forget operations should use BackgroundTask.Run() instead to ensure consistent exception handling, automatic logging, and metrics tracking. Replace Task _ = Task.Run(async () => { with BackgroundTask.Run(async () => { and inject/use the logger parameter.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +83 to +86
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));

// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Potential UriFormatException when URLs start with www. without protocol. The regex on line 73 matches URLs that start with www. but don't have http:// or https:// prefix. Line 83 then tries to create a Uri object with new Uri(url), which will throw a UriFormatException for URLs like www.example.com. Add protocol prefix if missing, or filter out such matches, or use new Uri(url, UriKind.RelativeOrAbsolute) with proper handling.

Suggested change
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));
// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
string urlToUse = url;
if (urlToUse.StartsWith("www.", StringComparison.OrdinalIgnoreCase)) {
urlToUse = "http://" + urlToUse;
}
Uri cleanedUrl = UrlCleaner.Clean(new Uri(urlToUse));
// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(urlToUse).OriginalString) {

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +86
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));

// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficiency: Creating new Uri(url) twice unnecessarily (lines 83 and 86). The Uri object is created once to clean the URL and again to compare the original string. Store the original Uri in a variable before cleaning and compare with that. This avoids redundant parsing and allocation.

Suggested change
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));
// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
Uri originalUri = new Uri(url);
Uri cleanedUrl = UrlCleaner.Clean(originalUri);
// if the url is same, don't send the message
if (cleanedUrl.OriginalString == originalUri.OriginalString) {

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +18
foreach (Rule rule in RuleData.Rules) {
if (rule.Match.IsMatch(url.ToString())) {
foreach (string r in rule.Rules) {
url = new Uri(Regex.Replace(url.ToString(), $"[&?]({r})=?[^&]*", ""));
}
}
}
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop implicitly filters its target sequence - consider filtering the sequence explicitly using '.Where(...)'.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +100
Task _ = Task.Run(async () => {
try {
foreach (string url in urls) {
Uri cleanedUrl = UrlCleaner.Clean(new Uri(url));

// if the url is same, don't send the message
if (cleanedUrl.OriginalString == new Uri(url).OriginalString) {
continue;
}

await _telegramBotClient.SendTextMessageAsync(
chatId: update.Message.Chat.Id,
text: $"Cleaned URL: {cleanedUrl.OriginalString}",
replyToMessageId: update.Message.MessageId,
cancellationToken: cancellationToken
);
}
} catch (OperationCanceledException) {
// Terminate gracefully
}
});
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to _ is useless, since its value is never read.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant