1 min readSep 10, 2020
Of course it depends on cluster size, input size and config params, but I will give an example of how to calculate it on a very simple example. Given a dataset with one column where all dataset's rows have the same value. Joining on this column will create result with one partition only. So here you can pick the salt value equals to spark.sql.shuffle.partitions of your Spark application.