1. Softmax is a way to make sure the OUTPUTS fall within a probability range. So once the network is calculating the K outputs, you can apply a softmax function to make sure it falls within the range 0-1
  2. Normalisation is applied to the INPUTS so that it lessens the load on the optimiser. one way to normalise (if thinking about an image) is to subtract the mean from the image, then divide that by the standard deviation.