Home>

I don't know how to update multiple times in mongodb.
The following update_many method can perform multiple updates corresponding to the primary key of Index by inputting DataFrame,
If the number of rows in the DataFrame becomes huge, DB access will increase and the processing time will also be huge.
I also checked db.collection.update_many and found that I couldn't specify multiple ids, so I'm worried about how to implement it ...
How should I solve it to reduce the processing time and the number of DB accesses?
We look forward to your professor.

  def update_many (self, df, now):
        d = df.to_dict (orient ='index')
        for id in df.index:
            d [id] ["updatedAt"] = now
            db.collection.update_one (filter = {"id": id}, update = {"$set": d [id]})
  • Answer # 1

    Updating one by one has the overhead of waiting for MongoDB's response, which is time consuming. There are two ways to improve it, either by multiplexing the connection with MongoDB or by using MongoDB Bulk. The former is a difficult task because it requires multiple threads on the program side, so I think the latter is convenient.

    MongeDB Bulk is
    insertOne
    updateOne
    updateMany
    deleteOne
    deleteMany
    replaceOne
    It is an excellent thing that you can give instructions to MongoDB at once by giving a combination of. Unlike a simple updateMany, updateOne can be instructed collectively by changing the parameters, so I think that it can be used even in the situation of the questioner.

    For MongeDB Bulk, there is a performance measurement article below. It seems that the performance is improved by nearly 10 times compared to each case.
    (Furthermore, you can see that Bulk has method passing as well as array passing).
    Use MongoDB Bulk Method

    You can see the pymongo methods below.
    Bulk Write Operations

    The corresponding methods on the MongeDB side can be found below.
    db.collection.bulkWrite ()