Suppose you have the followingusers
table in PostgreSQL.
id | name | |
---|---|---|
1 | Test Taro 1 | [email protected] |
2 | Test Taro 2 | [email protected] |
3 | Test Taro 3 | [email protected] |
4 | Test Taro 4 | [email protected] |
5 | Test Taro 5 | [email protected] |
6 | Test Taro 6 | [email protected] |
Emails are duplicates, so I want to find out the number of unique emails.
Keep it simple
SELECT COUNT (DISTINCT email) FROM users;
It's fine, but this is very slow.
Can you somehow speed it up?
I tried usinggroup by
, but the speed was not improved.
SELECT email FROM users group by email;
EXISTS clause can be used to increase the speed, but it is not related to other tables, it will be completed in theusers
table, so it seems that it can not be used was.
I want to speed up something other thanDISTINCT
. Is there any good idea?
Can it be faster than this if it is so simple?
I'm sorry, the question was a little fluffy, so let me add it.
I want to know the fastest query that satisfies the requirement ofI want to check the number of unique emails in the above table
without considering Index etc. That is the question.
When there are many records, it is clear thatSELECT COUNT (distinct email) FROM users
is slower thanSELECT COUNT (email) FROM users
. (distinct email) FROM usersI want to know a query that satisfies the requirement thatI want to find out the number of unique emails
earlier.
If it is not simply, I would like to take another approach (index review etc.).
If it is still better to have an explain result, add it.
-
Answer # 1
-
Answer # 2
You can create an index.
11.2. Types of indexes
When searching and narrowing down emails,
An appropriate index design that considers index performance and functions should be performed.
It seems to talk.I imagine that this is a non-production development situation,
That's the situation without an index
I think this is a great opportunity to compare the execution plans by EXPLAIN.
If there is no index, I think it will take time with FULL SCAN.
Related articles
- Using arrays to improve performance in PostgreSQL
- A tip to improve PostgreSQL performance
- Five tips for Instagram to improve PostgreSQL performance
- i want to improve search speed in postgresql with multiple columns and multiple keywords
- i want to improve the processing speed when throwing a select statement to the postgresql db placed on the external network
- postgresql - glassfish garbled characters do not improve and db connection is not possible
- [postgresql]: i want to improve a large amount of insert processing and move data between tables at high speed
- if postgresql distinct and null are used, an error occurs due to "type difference"
- postgresql - a5: how to improve sql mk-2 csv import speed?
- python : Is Postgres' COPY tablename FROM stdin with csv at risk of SQL injection?
- java sql update statement
- postgresql - about cross-join in the question about the percentile of 100 data scientist knocks
- mysql - i want to connect to the db in the docker container from sql client software
- postgresql - about partiton by null in postgres
- postgresql - i want to get monthly results with a sql select statement
- postgresql sql statement some columns cannot be retrieved error: column "diffvideos" does not exist
- [postgresql] a mysterious error could not connect to server when typing psql in terminal
- postgresql - sql i want to hold vertical data horizontally
- mysql - about db table naming
Setting an appropriate index is often effective in improving performance. Of course, I am not your boss, so I will not force you to add an index.