Home>

From the following string in Python

message = '1.1.1.1 --- [19/Nov/2020: 18: 47: 09 +0900] "HEAD http://s3-xxxxx-northeast-1.xxxxxxx.org/ HTTP/1.1" 503 342 "-" "curl/1.1.1" TCP_MISS: HIER_NONE'

I want to extract the following part and store it in a variable

19/Nov/20 20:18:47:09 +0900


What should I do?

What I tried
import re
time_pattern ='(0 [1-9] | [12] [0-9] | 3 [01]/[a-zA-Z] {3,4}/[0-9] {4}: ([01 ] [0-9] | 2 [0-3]): [0-5] [0-9]: [0-5] [0-9] \ s [+ |-] [0-9] {4 }'
time = re.search (time_pattern, message)
Supplementary information (FW/tool version, etc.)

Written in python on AWS Lambda.

  • Answer # 1

    The closing brace is missing.
    after,\ \Because there is, it is better to make it a raw string.
    The sign of the time zone is+---Because it's just|Is also excluded.

    time_pattern = r'(0 [1-9] | [12] [0-9] | 3 [01])/[a-zA-Z] {3,4}/[0-9] {4} :( [01] [0-9] | 2 [0-3]): [0-5] [0-9]: [0-5] [0-9] \ s [+-] [0-9] { Four}'


    I think the following is enough.

    time_pattern = r'\ d \ d/\ w {3}/\ d {4}: \ d \ d: \ d \ d: \ d \ d \ s [+-] \ d {4}'

  • Answer # 2

    simply.

    >>>re.search (r'\ [(. +?) \]', Message) .group (1)
    '19/Nov/2020: 18: 47: 09 + 0900'