Home>

MSCK REPAIR TABLE

I don't know what the above is doing.
I want to execute a query on the data in S3, but the above method (?) Runs every time data is added to S3.
Although it is a description for loading a partition, it seems that a huge amount of S3 data is fully scanned every minute and it takes a considerable amount of time.
Can you tell me what this MSCK method is doing before you come up with a countermeasure?
I'm vague, but I understand that I read the data directory in S3 and partition it.

aws
  • Answer # 1

      

    I'm not sure, but I understand that I read the data directory in S3 and partition it.

    I think that is correct.
    If the directory is cut in Hive format, the partition specified byPARTITIONED BYwhen creating the table will be automatically read if you wait (Sorry, it is sorry if the memory is wrong) However, I think thatMSCK REPAIR TABLEis purposely executed because it always wants to refer to the latest data when sending a query.
    I don't know for a moment what theMSCK REPAIR TABLEis used as a trigger, but it isMSCK REPAIR TABLE?

    Reference
    Data partitioning
    https://docs.aws.amazon.com/en_us/athena/latest/ug/partitions.html