PXR challenge #0: A brief update

In my previous post, I mentioned seeing a few inconsistencies in the numbers reported in the OpenADMET blog post compared to what I observed with the data. This and a few other issues were reported and discussed in the challenge Discord. The organizers promised to look into these issues.

Yesterday they announced that they had made small changes to the datasets in HuggingFace. They also released a lengthy document detailing the reports they had received, how the datasets were changed in the new version, and what steps they plan to take so this doesn't happen in future challenges.

I think the organizers handled the feedback well. Data curation is hard, especially across several interconnected datasets, and promptly updating pipelines in response to participants' comments is the best way to improve the process. Kudos to the team at OpenADMET.