Home>

I tried to generate uuid with the following code.

import uuid
print (uuid.uuid4 ())

No matter how many times you try,

71acnbfe-bcf4-4540-9eh6-0674a3fa6782


Since uuids with capital letters were not generated like this, is it okay to assume that this is the default setting?

  • Answer # 1

    To RFC4122 where UUID specification is defined

      

    The hexadecimal values ​​"a" through "f" are output as lower case characters and are case insensitive on input.

    Because it says

    , the Python uuid library just follows that specification.


    [Added]

    UUID is an integer value of 128 bits. It is handled as one large integer value * on the memory. When generating a UUID, this one large integer value is generated according to a different rule for each version of the UUID. In other words, it is just an integer value inside the program.

    ※ In fact, in many languages ​​and environments including C, there is no fixed-length integer type that can directly contain 128-bit integer values, so it is divided into multiple integer values ​​internally, or a 16-octet string It seems that it is managed as.

    Also, this single large integer value does not have a single meaning in the whole integer, but there is a delimiter called a field in a certain bit range. For example, a specific part of a field has the role of indicating the version.

    Now, it is good if it is completed inside the program, but when exchanging data, especially when exchanging as text data, it is too large to express as an integer in decimal. Also, the meaning of each field is not well understood in decimal numbers. In the first place, UUID was created to be usable as URN, so it must be able to be expressed in text. That's the format in which the hexadecimal numbers we normally see as UUIDs are separated by-. In addition, by separating the specific positions with-, it is easy to grasp the field from the human eye and it is easy to distinguish. (The meaning of each place depends on the version, so it's hard to say)

    This hexadecimal number separated by-is also UUID, but this is expressed as URN and is used when exchanging as text data. It is still treated as an integer value of 128 bits. In other words, a mechanism for converting them to each other must be established. Hexadecimal numbers are used as one of the rules, but lowercase letters are used to output text from internal representation integer values, and uppercase and lowercase letters are used to convert input text to internal representation integer values. It is supposed to be ignored.

    That is, when generating a UUID and getting its text, the program works as follows:

    A 128bits integer value is generated according to the rules defined for each version.

    Convert to integer value hexadecimal. At this time, use lowercase letters (a-f).

    Insert-at a specific position.

    Depending on the implementation, steps 2 and 3 may be mixed (connect with-afterwards). However, (unless it's a very strange implementation), basically you create an integer value first and then convert it to a hexadecimal string, not a string directly.

    There is an IP address as an example where the internal representation and text representation are different. IPv4 is just a 4-octet sequence internally and exists as a 4-octet sequence inside the packet. However, in settings and expressions, four decimal numbers (0 to 255) are separated by.. When a program that handles IP addresses handles IPv4 text notation, the reason for using decimal numbers instead of hexadecimal numbers, and the reason to use.as a delimiter for each octet is the same as in IPv4 for RFC. Since it is stipulated that it is written in, it is only implemented that way. The same is true for UUIDs, just say that each library only follows the RFC.