Role for JSON or similar format with links for relating anonymized and not-anonymized data?

Please forgive this shot from the hip, but I was thinking about the text unpacking of DICOM that @chafey showed us a while back.

I am wondering if that would be a useful way of dealing with anonymized and de-anonymized data.

In particular - I’m thinking of this pattern for archiving and retrieval:

  • Raw DICOM - goes to tape, difficult to get back.
  • Text / tree version of Raw DICOM, further split into:
    • Fields known to be without identifying data - the “anonymous tree”. This goes into the the main data store with more general access, and links to the corresponding:
    • Fields with potentially de-anonymizing data - the “identifying tree”. This is behind more protection, and needs more work / permissions to access.

The workflow would then be, that a researcher that needs particular de-anonymizing data, or that believes the withheld data is not deanonymizing, can ask for access to the “identifying tree” or parts of the identifying tree. Because this tree is in text format, the administrator can easily see what information is contained, and can easily release parts of the tree.

The storage would be efficient because the ‘identifying tree’ does not replicate the information in the anonymous tree.

Would this be a reasonable and efficient way to keep usable versions of the underlying DICOM information, with less risk of leaking data you did not want to leak?