Shall we worry about current AI taking over the world?

A perspective from causal machine learning

Main learning materials and reference:
[1] Little, Max A., and Reham Badawy. “Causal bootstrapping.” arXiv preprint arXiv: 1910.09648 (2019).
[2] Pearl, Judea. Causality. Cambridge university press, 2009.
[3] Pearl, Judea, and Dana Mackenzie. The book of why: the new science of cause and effect. Basic books, 2018.
[4] Harari, Yuval Noah. Sapiens: A brief history of humankind. Random House, 2014.

Controlled Experiments
控制变量法

One of the most fundamental devices for modern scientific research, especially quantitative sciences, is controlled experiments to measure the associations between two variables. This setting is quite vital to help almost all researchers to make robust and precise conclusions to the observed phenomena or test their hypotheses. An extremely famous story (maybe fictional) that every middle school student has heard in their beginning physics lessons is Galileo’s free fall experiments on the Tower of Pisa. We may not discuss the authenticity of this tale, but the story itself contains the important philosophy of doing scientific research starting to be mainstream from the Age of Enlightenment, that is, controlled experiments are usually one of the essentials of scientific methods.

现代科学研究(尤其是定量科学)最基本的手段之一是通过受控实验来测量两个变量之间的关联。 这种方法对于帮助几乎所有研究人员对观察到的现象做出可靠且精确的结论或检验他们的假设至关重要。 每个中学生在物理课上都听过的一个非常著名的(但也许是虚构的)故事是伽利略在比萨斜塔上的自由落体实验。 我们暂且不讨论这个故事的真实性,但这个故事本身蕴含着从启蒙时代开始成为主流的科学研究的重要哲学,即对照实验通常是科学方法论的要素之一。

With this powerful tool, pioneers have made loads of breaking through in the past centuries. For instance, James Lind started the era of scientific rationality by conducting controlled experiments for the cure of scurvy, pioneering modern clinical trials; Louis Pasteur’s spontaneous generation experiment illustrates that liquid nutrients are spoiled by particles in the air (later known as microorganisms) rather than the air itself, which helped us to understand the essential reason for many diseases; Albert Einstein revealed the photoelectric effect based on controlled experiments, which pushed our recognition to quantum mechanics… All of these reveals have significantly changed our understanding of this universe thus pushing homo sapiens’ civilizations to a historically unprecedented level.

凭借这一强大的工具,伟大的先驱们在过去的几个世纪中取得了巨大的突破。 例如,詹姆斯·林德设计了精巧的对照实验成功克服了坏血病,开启了科学理性时代,开创了现代临床试验的先河; 路易斯·巴斯德的自发生成实验表明,液体营养物质是被空气中的颗粒物(即之后被发现的微生物)而不是空气本身破坏的,这有助于我们了解许多疾病的本质原因; 阿尔伯特·爱因斯坦基于受控实验揭示了光电效应,推动了我们对量子力学的认识……所有这些揭示都极大地改变了我们对这个宇宙的理解,从而将智人文明推向了历史上前所未有的水平。

The data produced by controlled experiments help researchers capture the relationships between effects and their cause, avoiding selection bias, or the so-called confounding problem. This kind of relationship is, to some extent, stable and prudent enough to be re-used and verified.

对照实验产生的数据可以帮助研究人员捕捉结果与其原因之间的关系,避免选择偏差或所谓的混杂问题。 这种关系在某种程度上是稳定和审慎的,可以被复用和验证。

In some fields, however, controlled experiments are not always easy to conduct. Taking the epidemiological study of COVID-19 as an example, epidemiologists may not have sufficient time and resources to conduct well-designed Randomized Controlled Trails (RCTs) to deal with the extremely fast virus variations. And thus inferring the effectiveness of vaccines for a newly emerging COVID-19 variant is an empirical analysis based on the observational data collected by uncontrolled experiments (epidemiological Investigation). Meanwhile, followed by the Big Data era, enormous amounts of observational data are produced and recorded every second in various areas and they are not always from a well-controlled experiment environment, instead these observational data may get polluted by uncountable measurable and unmeasurable factors. Subsequently, a key question then arises: is this observational data collected from uncontrolled settings still valuable? Or an equivalent question would be: can we use these data to conclude useful, reliable and repeatable experience or knowledge?

然而,在某些领域,受控实验并不总是容易进行。 以COVID-19的流行病学研究为例,流行病学家可能没有足够的时间和资源来进行精心设计的随机对照试验(RCT)来应对极快的病毒变异。 因此推断疫苗对新出现的 COVID-19 变种的有效性是基于非对照实验(流行病学调查)收集的观察数据的实证分析。 同时,随之而来的大数据时代,各个领域每秒都会产生和记录大量的观测数据,这些数据并不总是来自于良好控制的实验环境,这些观测数据可能会受到无数可测量和不可测量的因素的污染。 随后,一个关键问题出现了:这些从不受控制的环境中收集的观测数据仍然有价值吗? 或者一个等效的问题是:我们可以使用这些数据来总结有用、可靠和可重复的经验或知识吗?

The answer may be “Yes”, but sometimes it comes with “No”.

这个问题的答案或许是肯定的,但有时候也是否定的。

“Causal-blind” Machine Learning
“因果盲”的机器学习

The recent spotlight on large language models (LLMs, such as Chatgpt) seems to have divided the public into two parties, debating whether they will take over human governance and control the world in the coming era of artificial intelligence, or whether they simply are tools to boost our productivity.

最近备受关注的大型语言模型(LLM,例如Chatgpt)似乎让公众似乎分裂成了两方,争论在即将到来的人工智能时代里,他们是否会接管人类治理并控制世界,或它们只会是提高我们生产力的工具。

Obviously, I would be a strong supporter of the latter. The reason is quite simple and direct, that is, most of the current AI models are totally blind to the backend causal knowledge (mainly because the observational data itself does not include any causal information) but are a machine more like a “library” that associates the existing knowledge from us and re-organises them for easier access and querying by people. The so-called “learning” process that AI modes are doing is actually a process of finding the regularities from the training data, but the key point here is most of the data we use to train the AI are from observational experiments, rather than from controlled experiments. This fact naturally brings a consequence of the “causal-blind” AI, because they simply lack some kinds of reasonable controls and possibly pick the unwanted associations from the data contaminated by unexpected and unobserved factors.

显而,我是后者的坚定支持者。 原因很简单也很直接,那就是目前大多数的人工智能模型对背后的因果知识完全视而不见(主要的原因在于观测数据本身并不包含任何的因果信息),而是一个更像图书馆的机器,将我们现有的知识关联并重新组织起来,以方便人们访问和查询。AI模型所做的所谓“学习”过程,实际上是从训练数据中寻找规律的过程,但这里的关键是,我们用来训练AI的数据大部分来自观察实验,而不是来自受控实验。这一事实自然会带来“因果盲”人工智能的后果,因为它们缺乏某种合理的控制,进而从被意外和未观察到的因素污染的数据中挑选出不需要的关联。

Yuval Noah Harari in his book Sapiens: A Brief History of Humankind presents an interesting but reasonable explanation of our superior and unique power that helps us to govern the world. He believes the power source of us is our ability to imagine and believe the “virtual entities”, which ensures all humans are able to cooperate in a large-scale and flexible way. For example, nation, money and even “human rights” are virtual things that only exist in our imagination and beliefs, but they, in return, repay us much more than expected – thus we can finely distribute magnificent projects into pieces and every member can specialize in a certain set of professional skills. An undeniable fact is the fine social division of labour has greatly promoted the development of productive forces. In his opinion, all of this came from a vital “recognition revolution”.

Yuval Noah Harari在他的《智人:人类简史》一书中对我们帮助我们统治世界的优越而独特的力量提出了有趣但合理的解释。 他认为,我们的动力源泉在于我们对“虚拟实体”的想象和相信的能力,这确保了全人类能够以大规模、灵活的方式进行合作。 比如,国家、金钱甚至“人权”都是虚拟的东西,只存在于我们的想象和信仰中,但它们回报给我们的回报远超预期——这样我们就可以将宏伟的项目细细地分解成碎片,每个成员都可以专注于某一组专业技能。不可否认的事实是,社会精细分工极大地促进了生产力的发展。在他看来,这一切都源于一场至关重要的“认知革命”。

“Homo sapiens rules the world because it is the only animal that can believe in things that exist purely in its own imagination, such as gods, states, money, and human rights.”

——A Brief History of Humankind, byYuval Noah Harari

The ensuing question is: how could we obtain this kind of ability to imagine these non-existent entities? Pearl believes this comes from the special brain structure that we have can stand on the top of the causal ladder and so make counterfactual thinking and thus answer questions such as “What if I had done something?” and “Why?”. Fig. 1 below shows Pearl’s causal ladder, where he classifies the abilities of understanding into three levels. Associating things together is at the bottom, helping us extract the regularities from what we actually observe; while inferring what may happen if we intervene on something is in the middle to help us answer questions such as “What if I do something, what would happen?”; and at the top is the ability to imagine to understand and abstract the true mechanism from the backend causal relationships between factors.

随之而来的问题是:我们怎样才能获得这种想象这些不存在的实体的能力呢? 珀尔认为,这来自于我们拥有特殊的大脑结构,我们可以站在因果阶梯的顶端,从而进行反事实思维,从而回答诸如“如果我这么做了,会发生什么?”以及“为什么?”之类的问题。下面的图1展示了Pearl的因果阶梯,他将理解事物的能力分为三个层次。 将事物联系在一起是最底层的,帮助我们从实际观察到的事物中提取规律; 中间层则是推断如果我们干预某件事可能会发生什么,这有助于我们回答诸如“如果我做某事,会发生什么?”之类的问题; 最上层的是想象能力,因此我们可以从各个因素背后的因果关系中理解和抽象出真正的机制。

Fig.1 Pearl’s Causal Ladder
(Source: The Book of Why by Pearl, Judea, and Dana Mackenzie)

Fortunately or unfortunately, current AIs have not experienced this kind of “recognition revolution”. They may have more powerful computing capability (billions of dollars and countless watts of energy are invested to maintain their computation and storage infrastructures), but what they have now is still very basic and more “animal-like” ability in detecting the regularities in the given environment (observational data). So we don’t have to worry too much about the scene like the movie “The Terminator” suddenly comes after a night, but we still should feel very upset about the fact that most of the current AI researchers and resources are invested in the wrong direction. Most recent efforts are more like polishing the stone age tools rather than inventing the steam engine, even though this generation of AI is experiencing flourishing and prosperous times. Under proper control and regulation, we really need an AI that can stand on the top of the causal ladder like a true homo sapiens, thinking in a causal way to answer the intervention and even counterfactual queries. In this way, we may finally undermine the worries about the un-explainable and risky “black-box” model thus understanding how and why our created AI make such decisions to make reliable, trustworthy and safe productive tools.

幸运或者不幸的是,当前的人工智能还没有经历这种“认知革命”。他们可能拥有更强大的计算能力(数十亿美元和无数的能源被投入以维护计算和存储基础设施),但他们现在拥有的仍然是非常基本的、更类似于“动物”的检测给定环境中规律性的能力(观测数据) )。所以我们不必太担心像电影《终结者》那样的场景在一夜之后突然降临,但我们还是应该对目前大多数人工智能研究人员和资源投入到错误的地方而感到沮丧。尽管这一代人工智能正在经历着蓬勃发展和繁荣的时代,但这些努力更像是在抛光石器时代的工具,而不是在发明蒸汽机。在适当的控制和监管下,我们确实需要一个人工智能,能够像真正的智人一样站在因果阶梯的顶端,以因果方式思考来回答干预甚至反事实的查询。通过这种方式,我们最终或许可能会消除对无法解释且有风险的“黑匣子”模型的担忧,从而了解我们创建的人工智能如何以及为何做出此类决策,以制造可靠、值得信赖和安全的生产工具。

Ending Words
结束语

In fact, my supervisor and our research group including myself are now working on this important and potential topic of causal machine learning, committing ourselves to proposing a new framework in this subject by introducing causal knowledge into AI. We truly believe that this goal is reachable, and at the same time, it will not come too late.

事实上,我的导师和我们的研究小组,包括我自己,现在正在研究因果机器学习这个重要且有潜力的课题,致力于通过将因果知识引入人工智能来提出这个课题的新框架。 我们坚信这个目标是可以实现的,同时也不会太晚。

Leave a comment