Home>

Suppose you have the followinguserstable in PostgreSQL.

id name email
1 Test Taro 1 [email protected]
2 Test Taro 2 [email protected]
3 Test Taro 3 [email protected]
4 Test Taro 4 [email protected]
5 Test Taro 5 [email protected]
6 Test Taro 6 [email protected]

Emails are duplicates, so I want to find out the number of unique emails.
Keep it simple
SELECT COUNT (DISTINCT email) FROM users;
It's fine, but this is very slow.

Can you somehow speed it up?

I tried using

group by, but the speed was not improved.
SELECT email FROM users group by email;

Checked I looked at some articles that

EXISTS clause can be used to increase the speed, but it is not related to other tables, it will be completed in theuserstable, so it seems that it can not be used was.

I want to solve

I want to speed up something other thanDISTINCT. Is there any good idea?
Can it be faster than this if it is so simple?

Append

I'm sorry, the question was a little fluffy, so let me add it.

I want to know the fastest query that satisfies the requirement ofI want to check the number of unique emails in the above tablewithout considering Index etc. That is the question.
When there are many records, it is clear thatSELECT COUNT (distinct email) FROM usersis slower thanSELECT COUNT (email) FROM users. (distinct email) FROM usersI want to know a query that satisfies the requirement thatI want to find out the number of unique emailsearlier.
If it is not simply, I would like to take another approach (index review etc.).

If it is still better to have an explain result, add it.

  • Answer # 1

      

    This time we don't consider Index etc. (assuming all columns don't have Index)

    Setting an appropriate index is often effective in improving performance. Of course, I am not your boss, so I will not force you to add an index.

  • Answer # 2

    You can create an index.

    11.2. Types of indexes
    When searching and narrowing down emails,
    An appropriate index design that considers index performance and functions should be performed.
    It seems to talk.

    I imagine that this is a non-production development situation,
    That's the situation without an index
    I think this is a great opportunity to compare the execution plans by EXPLAIN.
    If there is no index, I think it will take time with FULL SCAN.