I made a thread as a comparison of the identity of two vectors of the same size, but it's not as fast as I expected. Please let me know if anyone knows what the problem is.
The structure is very simple.
・ Elapsed_time_dbl is a template function for time measurement.
-Is_eq_ob Uses atomic as a thread stop flag when thread management class,! = Is found. Also used for returning the result judgment.

Corresponding source code
using namespace std;
template<class TimePoint>
double elapsed_time_dbl (const std :: string&msg, const TimePoint&start) {
    auto end = std :: chrono :: high_resolution_clock :: now ();
    std :: chrono :: duration<double, std :: milli>fp_ms = end-start;
    std :: cout<<msg<<":"<<fp_ms.count ()<<"ms \ n";
    return fp_ms.count ();
class is_eq_ob {
    int size;
    const vector<int>&m_a;
    const vector<int>&m_b;
    is_eq_ob (const vector<int>&a, const vector<int>&b):
    m_a {a}, m_b {b}, m_loop_end {false}, size {(int) a.size ()} {}
    void comp_thread (int st, int count) {
        for (int i = 0;i<count;i ++) {
            if (m_loop_end.load ())
            if (m_a [st + i]! = m_b [st + i]) {
                m_loop_end.store (true);
    bool go () {
        int div = 10;
        /// Create threads for the number of divs, divide size by the number of divs and process each block
        for (int d = 0;d  a (1000,0);
    vector<int>b (1000,0);
    // a [100] = 1;
    auto start = std :: chrono :: high_resolution_clock :: now ();
    cout<<(a == b)<<'\ n';
    elapsed_time_dbl ("a == b", start);
    is_eq_ob is_eq (a, b);
    start = std :: chrono :: high_resolution_clock :: now ();
    cout<<is_eq.go ()<<'\ n';
    elapsed_time_dbl ("is_eq_ob", start);
    return 0;
What I tried

The result is always slower for threads, as follows:
a == b: 0.064569 ms
is_eq_ob: 0.37863 ms

  • Answer # 1

    As they say, the thread management overhead is larger, and it seems that a simple loop is sufficient without using threads for processing such as simple value comparison. Even with about 3 threads, the overhead did not decrease.

  • Answer # 2

    It seems that the processing is too light with about 1000 threads, and the overhead of creating 10 threads is larger.
    How about increasing the number?

  • Answer # 3

    To determine the effectiveness of threads, the same logic must be used to compare multithreaded and single threads. When I tried it, multithreading is faster, but vector comparison operators are still different digits. Perhaps the implementation of the vector comparison operator is not a simple loop compare.

    This is the result of the debug version
    vector size 100 million

    vector comparison operator 1047.04ms
    Loop compare of a and b in main function 17159.7ms
    go () div = 1 23816.6ms
    go () div = 10 12923ms