You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-20Lines changed: 23 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,45 +40,48 @@ If you installed from source do:
40
40
41
41
The urls file, by default `urls.csv` must have all the urls you want to check. You can use a text file with 1 url per line or a csv file with the urls on the first column and without headers.
42
42
43
-
## Checking the urls
43
+
You can use [ecounter](https://github.com/greenpeace/ecounter) to create a urls file from a sitemap.xml file.
44
44
45
-
To check all urls in `urls.csv` with all the checks use the command:
45
+
## Http info about a list of urls
46
46
47
+
If you want to obtain information about http status codes, mime-types, file sizes and redirect urls of any urls, you can use `-http`.
48
+
49
+
You must use this check in a separate command like:
This repository includes a few testing urls in the file `urls.csv`. Please replace them by your own.
52
71
53
72
It will create a couple of files, one per check the script is doing:
54
-
*`httpResponses.csv` - Stores the **http response** codes for the URL. 200 means everything is OK.
55
73
*`analytics.csv` - Reports **google analytics** tracking ID
56
74
*`canonicals.csv` - Reports the **canonical url** for every url
57
-
*`redirects.csv` - Reports the requested URL and the final URL. This will be useful to test the **redirects** in the main site.
58
75
*`linkpattern.csv` - Reports on links that include a regular expression pattern. Useful to track **links** to specific **dead sites**. The default pattern can be set by the `-pattern` option.
59
76
*`cssjspattern.csv` - Reports **css and js** urls that include a regular expression pattern. To detect dead css and js urls in large sites. The pattern can also be defined with the option `-pattern` (described bellow)
60
77
*`mediapattern.csv` - Reports **media** links. Images, videos, audios, iframes and objects. Also use `-pattern` to define the urls pattern.
61
78
62
79
## Optional command line configurations
63
80
64
-
`-miliseconds=100` - Sets a delay of 100 miliseconds between requests.
81
+
`-miliseconds=100` - Sets a delay of 100 miliseconds between requests (the default value)
65
82
66
83
`-pattern='https?://(\w|-)+.greenpeace.org/espana/.+'` - Changes the search link pattern to the regular expression.
67
84
68
-
## Information about other urls
69
-
70
-
If you want to obtain information about non-html files, like for example images, it's better to use `-fileinfo`.
71
-
72
-
You must use this check in a separate command like:
-http : Gets the http response code. If it's 200 it should be OK.
26
+
-http : Gets the http response code, mime-type, file size and final url. It must be used separately from the other checks.
27
27
28
28
-analytics : Gets the first Google Analytics account.
29
29
30
30
-canonical : Gets the canonical URL for the url.
31
31
32
-
-redirects : Gets info about redirects and final URLs.
33
-
34
32
-linkpattern : Gets links that match the regular expression pattern.
35
33
36
34
-cssjspattern : Gets CSS and JS URLs that match the regular expression pattern.
37
35
38
36
-mediapattern : Gets urls from images, videos, audios, iframes and objects that match the regular expression pattern
39
37
40
-
-fileinfo : Speciall check more suitable for non-html pages (for example images). It needs to be used alone as the example above, without other checks.
41
-
42
38
43
39
OPTIONS:
44
40
@@ -48,6 +44,10 @@ OPTIONS:
48
44
49
45
-miliseconds=100 : Sets a delay of 100 miliseconds between requests.
50
46
47
+
OTHER:
48
+
49
+
-clear : Deletes all the files with the reports
50
+
51
51
52
52
FILES WITH THE REPORTS:
53
53
@@ -57,8 +57,6 @@ FILES WITH THE REPORTS:
57
57
58
58
- canonicals.csv : Reports the canonical url for every url
59
59
60
-
- redirects.csv : Reports the requested URL and the final URL. This will be useful to test the redirects in the main site.
61
-
62
60
- linkpattern.csv : Reports on links that include a regular expression pattern. Useful to track links to specific dead sites. The default pattern can be set by the -pattern option.
63
61
64
62
- cssjspattern.csv : Reports css and js urls that include a regular expression pattern. To detect dead css and js urls in large sites. The pattern can also be defined with the option -pattern (described bellow)
0 commit comments