Home>

There is a regular expression that pulls all URLs from html text:

r'''http[\:/a-za-zA-ZA-Z0-9\.\?\=&
-]*'''

But it also pulls the following values:

https:
http-equiv=
http
http:
https
https?://

How to fix the regular expression so that such values ​​do not get into the final list?

It also doesn't work for links that start without a protocol name, ie. linkwww.ria.ru/infografika/orria.ru/infografika/it won't find it.

forget about regular expressions and try some specialized module?

strawdog2022-01-15 07:54:14

I know about the existence of special modules, but I need a regular expression

Елена Сергеева2022-01-15 07:54:14

A hammer is more convenient than a microscope for hammering nails

Kromster2022-01-14 08:01:44

@strawdog, anyone? Well, very good advice.

Qwertiy2022-01-14 08:30:50