Evaluating the performance of demographic inference methods