Randomness related to Buyer
The PPDS uses the following rule:
A notice N is from a buyer B:
-
If B is the only buyer, or
-
If B is the Group Leader / Lead Buyer, or
-
If there are several buyers, none of which is group leader, the first B instance is selected.
This rule intends to identify one unique buyer for each notice in the cases a notice has more than one buyer.
It is clear that there are circumstances in which there is no way to select the best buyer, so indeed a random guess is as good as any. However, I’d advise to create a decision tree that always picks the same buyer (e.g. order on name or id in the last step and then take the first one). That is the only way to ensure that the same Buyer is chosen if data is ingested again in the future, this consistency is a desired property for the PPDS dataset.