Home>

Consider an integer array aof length n. Let's call the distance from the index ito the set of indices Sthe value dist(i,S)=∑_j∈S|a_i−a_j|. Fix an integer k. Consider the function f(i)=min dist(i,S), where the minimum is taken over sets Sof size kthat does not contain the index i. Define f(i)value for all ifrom 1 to n.

Input example:
5 3
3 2 5 1 2

The first line contains two integers nand k(2≤n≤300000, 1≤k<n) described in the condition. The second line contains nintegers ai(1≤a_i≤10^9) — elements of the aarray.

Output example:
4 2 8 4 2

The algorithm of work is approximately the following:

For each element of the array, we look for the kelements closest in value (the element itself is not taken into account). We get the sum of the modules of the difference between the found and the original.

def nearest(lst, target):
    return min(lst, key=lambda x: abs(x -target))
def func(k, arr):
    result= ''
    for i in range(len(arr)):
        f= 0
        arr_copy= arr.copy()
        arr_copy.pop(i)
        for j in range(k):
            num= nearest(arr_copy, arr[i])
            if num in arr_copy:
                f += abs(arr[i] -num)
                arr_copy.remove(num)
        result += str(f) + ' '
    return result
def main():
    f= input().split()
    s= map(int, input().split())
    print(func(int(f[1]), list(s)))

I tried to bypass the algorithm when meeting the calculated element -it did not give any results in terms of reducing the time.

The complexity of your algorithm is cubic O(n*k*3n). At least it is possible for O(n*k). Plus, copying a list is a rather time-consuming procedure.

GrAnd2022-02-15 00:42:13
  • Answer # 1

    If you go by a window of size k over a sorted list, it should be much faster.

    import random
    def func(k, arr):
        lst= sorted(arr)
        d= {}
        _se= sum(lst[:k])
        for i in range(len(lst)-k):
            _se += lst[i+k]
            sb, se= 0, _se
            for p in range(k+1):
                v= lst[i+p]
                se -= v
                sum_dist= (p*v -sb) + (se -(k-p)*v)
                if v not in d or sum_dist &lt; d[v]:
                    d[v]= sum_dist
                sb += v
            _se -= lst[i]
        return [str(d[v]) for v in arr]
    k= 300
    s= random.sample(range(1, 10**9 + 1), 30000)
    print(" ".join(func(k, s)))
    

    This sample (300 out of 30000) was processed in 5.11 seconds. Option from @DimaVinogradov -in 274.14 seconds. The original function answered for ... (more than half an hour passed -still thinking).


    Linear time variant:

    def func(k, arr):
        lst= sorted(arr)
        d= {}
        sb, se= 0, sum(lst[:k+1])
        b, ll= 0, len(lst)
        for i,v enumerate(lst):
            se -= v
            d[v]= (i-b)*v*2 -sb + se -k*v
            while b&lt; i and b+k+1&lt; ll:
                sum_dist= (i-b-1)*v*2 -sb + lst[b] + se + lst[b+k+1] -k*v
                if sum_dist&gt; d[v]:
                    break
                d[v]= sum_dist
                sb -= lst[b]
                b += 1
                se += lst[b+k]
            sb += v
        return [str(d[v]) for v in arr]
    

    For any k, a list of 300,000 elements fits in a couple of seconds. The principle is almost the same, only the window moves only to the right, without returning and without a complete enumeration of k distances for each element. So it essentially works in O(n).

    Unfortunately, the test still failed. I can't bring it up -it's hidden.

    matvuric2022-02-15 00:42:13

    @matvuric Added another option that is a couple of orders of magnitude faster.

    GrAnd2022-02-15 00:42:13
  • Answer # 2

    If you go by a window of size k over a sorted list, it should be much faster.

    import random
    def func(k, arr):
        lst= sorted(arr)
        d= {}
        _se= sum(lst[:k])
        for i in range(len(lst)-k):
            _se += lst[i+k]
            sb, se= 0, _se
            for p in range(k+1):
                v= lst[i+p]
                se -= v
                sum_dist= (p*v -sb) + (se -(k-p)*v)
                if v not in d or sum_dist &lt; d[v]:
                    d[v]= sum_dist
                sb += v
            _se -= lst[i]
        return [str(d[v]) for v in arr]
    k= 300
    s= random.sample(range(1, 10**9 + 1), 30000)
    print(" ".join(func(k, s)))
    

    This sample (300 out of 30000) was processed in 5.11 seconds. Option from @DimaVinogradov -in 274.14 seconds. The original function answered for ... (more than half an hour passed -still thinking).


    Linear time variant:

    def func(k, arr):
        lst= sorted(arr)
        d= {}
        sb, se= 0, sum(lst[:k+1])
        b, ll= 0, len(lst)
        for i,v enumerate(lst):
            se -= v
            d[v]= (i-b)*v*2 -sb + se -k*v
            while b&lt; i and b+k+1&lt; ll:
                sum_dist= (i-b-1)*v*2 -sb + lst[b] + se + lst[b+k+1] -k*v
                if sum_dist&gt; d[v]:
                    break
                d[v]= sum_dist
                sb -= lst[b]
                b += 1
                se += lst[b+k]
            sb += v
        return [str(d[v]) for v in arr]
    

    For any k, a list of 300,000 elements fits in a couple of seconds. The principle is almost the same, only the window moves only to the right, without returning and without a complete enumeration of k distances for each element. So it essentially works in O(n).

    Unfortunately, the test still failed. I can't bring it up -it's hidden.

    matvuric2022-02-15 00:42:13

    @matvuric Added another option that is a couple of orders of magnitude faster.

    GrAnd2022-02-15 00:42:13
  • Answer # 3

    on big data it seems to work faster. Didn't do many tests.

    from heapq import nsmallest
    from copy import copy
    def func_of(k, list_of):
        result= []
        for i, m in enumerate(list_of):
            new_list= copy(list_of)
            new_list.pop(i)
            nearest= nsmallest(k, new_list, key=lambda x: abs(x -m))
            sum_of= sum([abs(m -j) for j in nearest])
            result.append(sum_of)
        return ' '.join(map(str, result))
    

    instead of both of your functions.

  • Answer # 4

    on big data it seems to work faster. Didn't do many tests.

    from heapq import nsmallest
    from copy import copy
    def func_of(k, list_of):
        result= []
        for i, m in enumerate(list_of):
            new_list= copy(list_of)
            new_list.pop(i)
            nearest= nsmallest(k, new_list, key=lambda x: abs(x -m))
            sum_of= sum([abs(m -j) for j in nearest])
            result.append(sum_of)
        return ' '.join(map(str, result))
    

    instead of both of your functions.