What is the directory structure for browsing a large number of files quickly on Linux?

A large number of product image files are stored on the CentOS web server.
What directory structure should I save to display image files as quickly as possible? I'm considering it.

  • Save about 1 million image files.
  • 10 directories from "0" to "9" are created and nested in 4 layers to create 10 × 10 × 10 × 10 = 10000 storage locations. (Example) /img/1/2/3/4/A.jpg (Pattern A)
  • Create 100 directories from "00" to "99" and nest them in two layers to create 100 × 100 = 10000 storage locations. (Example) /img/01/02/B.jpg (Pattern B)
  • The frequency of writing (adding/updating) image files is low, and the frequency of reading is high.

For Linux file systems (such as CentOS ext4 and XFS),
Pattern A: Reduce the number of directories and deepen the hierarchy
Pattern B: Increase the number of directories to make the hierarchy shallower
So which will be faster to read?

When the number of image files increases to 100 million in the future, we expect to face performance degradation.
What ideas and improvements are others working on? I asked a question to know.
I would appreciate it if you could give me some hints and reference information. Thank you.

  • Answer # 1


    Save about 1 million image files.


    The frequency of writing (adding/updating) image files is low, and the frequency of reading is high.

    If it is

    , I think that storing it in object storage such as Amazon S3 is the quickest solution. You will almost never think for yourself.

  • Answer # 2

    Squid's cache has two levels, and I think the default is about the first level 16 × 2 level 256.

    If it is 1 million, it is about 250/directory. It seems that it is still okay if it is 10 million, but it is a number that thinks that it is 100 million.

    If you have to assume 100 million (the number of requests will naturally increase), you need to consider whether it can withstand a single server. Relying on external services is an option, but it doesn't mean you don't have to think about anything. Since the risk of making it yourself is not lost, it is just converted to billing, so you need to consider it carefully.