Overall, current VLMs exhibit limited and highly variable progress estimation ability under direct prediction. Even strong models achieve only moderate performance and remain highly sensitive to the demonstration setting, while several models show abnormally low or even negative PRC, indicating distorted temporal ordering rather than meaningful ordinal progress reasoning. In addition, some models adopt overly conservative behaviors with elevated AFRR, rejecting answerable cases instead of producing calibrated progress scores.