The North American Multi-Model Ensemble (NMME) is a multi-model seasonal forecasting system consisting of models from combined US modelling centres. The NMME is expected to generate better rainfall prediction than a single model. However, the NMME forecasts are underdispersive or overdispersive, and calibration is needed to produce more accurate forecasting. This research examined the monthly rainfall data in Surabaya generated by nine NMME models and further calibrated them with bayesian model averaging (BMA). The purpose of this research was to assess the performance of the calibration results using the best four models and the full ensemble. The four models are CanCM3, CanCM4, CCSM3, and CCSM4, which were selected based on their skills. Both calibration results were evaluated using the continuous range probability score (CRPS) and the percentage of captured observations. The calibration with four models produced an average CRPS of 6.27 with 88.16% coverage, while with nine models an average CRPS of 5.23 with 92.11% coverage was obtained. This result suggests using the full ensemble to generate more accurate probabilistic forecasts.