Question 1

How can metadata get lost when collecting AI data sources?

Accepted Answer

Metadata is data about data, such as a digital tag showing who owns a file or its security classification. When you extract information out of its original secure home to use it for an AI project, the digital security tags, access controls, and protective masks often get left behind, like pouring purified bottled water into an unmarked bucket so nobody knows if it is still safe to drink.

Question 2

What is commingling in a data lake?

Accepted Answer

Commingling is the dangerous practice of mixing highly restricted, top secret files in the exact same storage area as everyday public information inside a data lake. If the strict access controls from the original source are not preserved in the new pool, sensitive secrets that were once restricted suddenly become widely available to all users.

Question 3

Why do vector databases need new access control models?

Accepted Answer

A vector database converts documents, pictures, and audio into vectors, which are long lists of numbers that capture the meaning of the file along with its metadata. Because the original raw text is completely gone, traditional security locks no longer work, so organizations must control access through new models such as a user attribute, a search index, or the exact query being asked.

Question 4

Why must confidentiality controls extend into AI production?

Accepted Answer

In production a live system takes live data through automated data prep pipelines and the model returns inference results. Depending on the data sources needed for those answers, all the strict data classification and handling protocols must be implemented in the live system too, just as a busy restaurant kitchen must follow food safety rules at every step during the dinner rush, not only when food sits in the refrigerator.

Data Confidentiality in AI: Encryption, Access, and Need-to-Know

What this episode covers

Frequently Asked Questions

How can metadata get lost when collecting AI data sources?

What is commingling in a data lake?

Why do vector databases need new access control models?

Why must confidentiality controls extend into AI production?

📚 Master the ISACA AAIA Exam!