Transcript of a conversation on the limits of Control Theory as applied to AGI Dec, 13, 2023 Forrest: Given that this meeting is now happening, Where are we? Who is organizing this? How I can best support? Linda: I am trying to understand if the substrate convergence argument is pointing out another risk we should be aware of, or pointing out a literal impossibility. Forrest: It is both. Linda: I sometimes talk to people who are interested, and the counter argument that comes up over and over is, is there not some control theory? Is there not some error correction mechanism we can use to solve this? And I do not know. I do not know control theory very well, I have heard claims about things you can do with error correction mechanisms that I intuited it would not have expected. So I do not really know the power of this. So I turned to Anders Sandberg who knows those things, and whose research area is what is possible to help investigate, like, to probe this risk argument and figure out exactly where is the boundary between what is possible and impossible. Forrest: Great. Yeah, that sounds good. Just basically to put declaratives on this so that there's no ambiguity. I am explicitly claiming that there are limits to what is possible to do with control. So in other words, I am basically saying there are limits to the nature of the causal process. The questions that I am hearing were: What is it this being claimed? And Are limits to controllability? I do claim that the substrate needs argument defines a distinct category of existential risk. Let us just keep this simple. I will just do the declarative, which is, yes, I am claiming that it is a distinct category of existential risk associated with artificial intelligence. And I am also claiming that there are limits to controllability. First, just generally, which is in effect to claim that there are limits to what can be done with causation, and that these limits are applicable to the AI safety argument, also generally. And that they are relevant limits. They are relevant limits in the sense that they apply for any reasonable characterization of what 'safety' would mean, which in this particular case is roughly analogous to 'preserves carbon based life on the planet'. Or 'is at least compatible with carbon based life on the planet'. There are also a series of secondary arguments that show that those limits of control apply categorically. And moreover, that there is a kind of convergence process that not only says that, it is not only the case that this is an existential risk, but also that it is actually a very serious one, because we are not just talking about limits of what is possible to control, but we are actually talking about a convergence force that is actually quite powerful and cannot be constrained. So you end up with convergence on worst case scenarios in the long term. Maybe by 'long term' most people would think, "if it is a billion years, we do not care". But no, unfortunately, we are talking less than 1000 years, and probably substantially less than even 500 years -- or in worst cases, maybe even as few as 10 years. So we are really wanting to articulate large categories of this really well, so that Anders, you understand it, at least as well as I have. This is because -- first of all -- if there are mistakes made, that you will notice them and critique that. But more that, if it turns out that this is something that makes sense to you, as well as it does to me and Remmelt, then you then become a proponent of protecting the world, which is why I got into this in the first place. I am just going to start with that as a declarative and we can start to unpack that. But as you can see, there are a number of distinct claims, and they kind of interlock with one another in a number of very specific ways. There is a lot of important ground to cover. So what is your desired entry point? That will define how this conversation wants to go. Linda: Yeah, I invite others and Remmelt, if you want to make your opening statement about what you want out of this discussion. Remmelt: Linda, does that connect with what you want to talk about? Linda: I think I see my role here as a representative of a typical person from the AI Safety community. And hopefully the things that I find interesting about this is the same thing other people find interesting. And they have some evidence of this, but just casually talking about this with people in other AI Safety meetups, and the bit where if natural selection managed to get hold of the process, where that just leads to us dying, I am like, yeah, I am on board. If we are not controlling the process, we are dead. So the question about:. is there an impossibility of control? This is where I hear what you are claiming. And I currently do not know if it is true. And that is why I wanted someone who understands control theory to listen to what you say. Whenever I talk to someone who knows a lot more than me about control theory and error correction, what I hear from those people is that there are some quite impressive things that you can do with these methods and being in between I cannot tell who is right. So I am here to listen and learn. The questions you are asking are reasonable questions. It is good that you have some familiarity with the topic. Part of the reason that Remmelt is actually ideal to be part of this conversation is that Remmelt was roughly in the same position as you are some two or three years ago. Remmelt is also an excellent representative of the AI safety community. When I first encountered Remmelt, he had very similar concerns as you do. Moreover, at that time, I had not yet articulated some these arguments as well as I have be able to more recently. These arguments and answers have become increasingly well supported since then. Anders: My role here is to be the friendly skeptic. I have already discussed this a bit and we had a very good earlier conversation about this topic. I think there is something very interesting here. It is a very rich topic. And at the same time, I am kind of skeptical. This partially because there is a claim that there is a theorem about alignment not being possible. At that point, my mathematics and philosophy background start ringing a bell, as theorems require quite firm reasons. Theorems are quite often very fragile. I think there is definitely an argument here. An argument -- that even if you cannot prove it in a manner that will convince a mathematician or philosopher -- that might still actually be super valid. Ideally, we want to move towards making the sharpest, best, most robust arguments -- and perhaps theorems or theorem like things -- that can guide us in both further investigations. Or, if we actually think that the argument is enough to convince us, we should actually be acting in a different way. So I do find this interesting. It is a different kind of argument than we normally see inside of the AI risk community. A lot of the AI risk arguments have been based quite firmly on what we expect software and minds to do. This is much more of a physics and ecosystem and evolutionary processes, and indeed a control theory argument, which also suggests that it might add some interesting things to the toolbox of thinking about risks and problems here. So that is why I am interested. And then I might be the annoying guy who is constantly trying to find interesting ways out and playing the devil's advocate. I want to poke at these arguments, try to figure out where they are strong, and where they are soft -- or where we can build more on top of them. Forrest: I totally understand and agree with all that you said. But for the sake of coherence in presentation, I would love to distinguish between the action and effort to lay out the argument, so as get to the point where we understand one another, before the critique part, because the thing is, in order to set up the transition from the state of 'we have presented an argument', to the state of 'we have developed a formal terminology that is capable of supporting the weight of a proof', there are a lot of transitional steps involved. I will have to start from what will seem to be a weirdly characterized and idiosyncratic view, specifically because I will need to set up for the language forms needed for formality. If there is not enough of the right kind of formality in the language, I will not be able to do the proof either. So to start, I will have to take a very idiosyncratic view of the topic, and then show that that idiosyncratic view is actually fully compatible with the conventional view, and, in certain sense, is also substantive of that topic. Once having transitioned to this idiosyncratic view, we can formalize it, and then from that formalization, be able to establish the conventions for the proof. Then we can do the proof. And then show that the results of the proof do apply all the way back through the chain of relevant reasoning to actually say something relevant about what is actually happening in the world. So that is a lot of presentation work. And there is not any part of it that is optional. The net result is is that literally every part of that path will be something that looks like would seem like it is very easy to critique, and it is not actually so. In other words, if you do the "devil's advocate" thing too soon, then what ends up happening is that we never will actually have made the key transitions along that critical path -- we do not ever get to the place where the formalities begin to apply. For instance -- and this has happened a lot -- people nit pick things that are not even important and do not matter -- and are just completely irrelevant to the argument. This happens largely just because the terms and definitions being setup are largely unfamiliar -- the reasons for the why of things being done a certain way are not obviously apparent, and may not become so until long after. To reject the definitions because of opinions is not to actually listen to what is being said. This is likely because we need to be going in a different direction, on purpose, to get where we need to be to understand the key ideas. Remmelt has watched this occur in any number of conversations. In fact he even did it to me, when we first met. He has moreover experienced this problem firsthand himself, when he has, later, once understanding, tried to present these same arguments himself. Once you have them, you also will likely experience this difficulty with other people. So it is recommend to go slowly, and seek 1st to understand, before trying to be understood. Remmelt: I have done this myself, I have poked at the arguments. My stance, when I came into this was beginning of 2021, was, 'I want to see if this is something relevant for the AI Alignment community to focus on'. Because if it is not relevant, it is going to waste people's time. And I do not want to waste people's time. Yet, if it was relevant, (if it did apply) and if it seems to be an important argument, then I want more people to know about it. I also came (at the time) from the perspective of, it does seem that there are certain ways of thinking that are more popular in the AI alignment community, and I want people to consider other perspectives more. So that is where I came from. And that is where I was probing his arguments a lot. I think probing done in the spirit of, 'I actually want to understand', feels good to me. Doing that in a way, maybe we can set some time, at the start of the conversation, so he can actually build up the arguments. This needs to happen at least to the point of being understandable in terms of intuitive language. It needs to be possible to say 'here are the principles at play', (before trying to take it apart again). Later on, I am happy to also help explain. Until then, I think I will mostly be listening. I am also happy to chip in when Anders is probing in a way where I feel like I might be able to also help clarify something -- especially if I have done that same probing myself before, or something similar to that, so as to save time -- if that is useful for you Anders? Anders: Yeah, certainly, yeah. I am a bit concerned that -- the risk is that -- if it becomes too much, it is going to be hard for any listener to actually get to the point of it. And I think there is an interesting issue here -- the issue of setting up the terms of reasoning. In many very different disciplines, people have very different approaches to how you lay out an argument, and what forms of derivations are valid, etc. We kind of need to define that as we go along here, and then get on to the probing part. But that can take quite a long time. So there is this interesting problem in this recording and transcript -- there might be a fairly long layout until we get to the more interactive portion. Linda: I do not think we should worry too much about what it is like to listen to this. I can fix it afterwards. I think that the thing we should focus on in this current call, is to have the conversation be mostly for Anders sake. I would like you two to get to the point where others can probe the arguments and test the theory. Forrest: I am just gonna straight up say, that the only person that I am concerned with insofar as understanding anything that I say herein, is first Anders, and then because of Anders, then Linda. In effect, Linda is asking Anders to help Linda to sort out what is going on. Linda: I want the two of you to have a conversation. I can talk to Remmelt later, if there is something I am not following. Forrest: Given the complexity of what it is that I have to do, please understand that the cognitive load that I am taking on to do this presentation is very large -- it is a lot. While I can maybe try to consider, for any given portion of the argument what effect any given assertion I make is maybe going to be within Anders' language and worldview, insofar as I am able to learn Anders' language, in real-time, and thus construct phrases that will make best sense to him, while doing this presentation, I am not going to attempt to try to figure out for every possible future listener -- how they might also be be listening to this content. It is not a simple argument to present, as anything new will be. To ask me to try to anticipate how how all future unknown people will interpret every single statement I make -- That is an unfair request. So I am very thoroughly in agreement with what you are saying Linda. It is absolutely crucially important. When I was talking with Remmelt the first time we went through these ideas together, it was just extremely difficult to render it in a way that was uncontested enough just to get the point that he could even really understand what I was actually getting at. And we were not even thinking about anybody else -- Could not even start there. So I am actually explicitly requesting that we let go of any requirement at all for us to be considering at all anybody listening to this thing ever in the future. If we choose that by the fact of coming to some agreement, or of some alignment, or we perhaps come to some understanding, and we agree we should tell somebody about this, then its of your own free choice that that happens -- that other people listen or read what we say here today. At that point, you can do so because you believe it is right. The other thing to try to limit the scope of this conversation -- because Anders was right to speak of that -- this can really become complicated beyond the point of common intelligibility. Therefore, first of all, I am going to let go of trying to outline the substrate needs argument. And I am also going to let go of trying today to connect this to everything associated with artificial intelligence risk calculation work. Today, we are going to focus on just control theory. A discussion of control theory does include what is usually called 'error correction'. Error correction is a sub-discipline of control theory. If we find that there is some idea that applies to control theory, by implication it will also apply to the field of error correction. That is a strictly a correct move to make. In other words, if it is necessary, it can be shown that there is a complete 100% correspondence between the entire dynamic of error correction and a subset of control theory. However, it is helpful to 1st start by considering error correction, especially given that there are certain assumptions that are made when thinking about error correction that are relevant to considering control theory -- and vice versa, especially when applied to the topic of general AI. It is relevant when it comes to the application of error correction to artificial intelligence process. These questions are a subset of: How do we do control of artificial intelligence? There are a couple of very specific assumptions that are necessary to outline that are genuinely an often overlooked, which turn out to be crucial to the argument of whether AI can be controlled in the general sense. Even trying to limit this so as to say that when we are talking about control theory, that there are also some specific ways in which we can talk about control theory that define explicitly whether it is relevant to AI work. Anders: Yeah, should we start with the substantive part. I think we kind of understand where we are on the meta level here. So I do think that the substrate convergence arguments are interesting, but I also agree, they are much harder to formalize. They basically require saying a lot of stuff about literally the physics of stuff in the universe. That is an iffy and complicated matter, while control theory and error correction are things that are much easier to formalize. Forrest: Well, this is one of the assumptions. It turns out that in some domains of thinking, this is very true. With the wrong basis, seeing the possibility of any useful formalism at all can become quite difficult. However, with different basis of formalism, things which previously seemed really hard can actually become quite tractable. And this is actually quite important. When we think about error correction, there is a tacit assumption that we know the language for which we are doing error correction. We therefore also assume the whole field of expression over which that error correction will occur. Usually when we are thinking about error correction, we are thinking about a communication channel. Inputs go in one side of the channel, and they come out the other side. We are usually looking at the correspondence between the inputs and the outputs. And we say, 'they ought to be the same'. Ie, that we do not end up with garbled messages in the channel. That effectively means that we will notice that the outputs are even a little different than the inputs. Error correction is concerned with 'how do we notice changed/damaged messages?' and 'how do we repair broken messages?' and so on. However, when we are thinking about general artificial intelligence, we do not necessarily know in advance what level of abstraction we are operating on. Moreover, that basically means we do not necessarily know in advance what the language is in which we are going to be working, or whether or not that language is even parsable in some sort of discrete sense. Remmelt: One thing you mentioned to me is that error correction works well at some levels of abstraction, like lower levels, and not so well at higher levels, particularly when we are talking about certain kinds of architectures. So if we are talking about bits, errors would be bits flipping in hardware, perhaps because there was some cosmic ray that hit the relevant part of the hardware so then those bits jumped from zero to one or vice versa. in this case, you can make use of redundancy for example, or you can have some backup bits stored somewhere else that you can check. Or you can have check sums to check if whether the bits on one place are the same as the other or like CRC when using the internet and spreading things around. But when we start looking at higher level abstractions, like when are talking about things well above TCP/IP, above actual programs running on these machines, now the kinds of errors we might be talking about are things like viruses, Trojan horses, etc. There are very different ways problems could seep in. Over confidence in the tool of 'error correction' would be misplaced. We have to ask questions like:. What is "an error" here? How do we define "an error"? What is something that we do not want? How is that expressed in the real world? When real people are using these systems for real things -- like handling their finances -- that it becomes a lot harder to define the sorts of simple easily characterized finite answers to these questions. And so what we are talking about is how to consider these higher abstractions. We need to start thinking about this in terms of the kinds of things happening in the neural networks running on top of multiple layers of software stacks running on top of hardware. When we are talking about neural networks, for example, which are running on multiple different machines, with all of these kinds of different connections with all manner of other software, that we are somehow, talking about AGI that would actually have enough capacity to, at the very least, be able to produce their own components, be maintaining, in general, to have the general capacity, to at least to make sure that they can operate and maintain themselves in all manner of environments -- as one way of thinking about artificial general intelligence. So we need to kind of look at that, that macroscopic view, as there are lots of ways that that code and connection -- as running on real hardware, in connection across what lots of physical spaces -- could be functioning, or "malfunctioning". And the question is:. What do we error correct for there? Can you actually error correct for all the possible ways that this could go wrong? Anders: Yeah I totally get it. There is also that the issue of, you are trying to bring the state of a system into a particular subspace of what you want to get via those information channels. In classical bitwise error correction it is all about getting the bits over words correct. You want to kind of move them to the right point. But the problem, with neural networks and other higher order representation might be that it is actually sub-spaces that have certain degrees of freedom, where we actually do not... we would normally say 'maybe that those degrees of freedom are all right? Maybe we cannot tell whether we are right or wrong. So therefore, we are neutral about whether the operation was correct. We can tell that some corrections, for example of a neural network are correct because it actually improves performance on test examples. But we cannot know if it actually has preserved the overall manifold that the neural network might represent. Forrest: Part of the way that we can shore this up is you can have error correction at bits, and you can have the bits represent letters, and you can think about error correction implied in that the letters in the words are spelled correctly. That is the basic level. Yet if we are considering error correction at the level of the words, we ask:. "Are they the right words, in the sentence?" If we ask what is 'error corrections' at the level of sentence, we see instead:. "Is this the right sentence in the context of this paragraph or on the context of this page?" What about error correction as to: “Is this paragraph the right 'meaning' in the context of the conversation that we are having?" Or going up to even a higher level: "Is this the right mission statement intentionality for our new startup business to have in the world? Say we have a CEO defining a mission statement for a company. At this higher and larger scale of abstraction, we are basically asking;. “Is this the right quality of health that we are wanting for the world?" Now, unfortunately, when we are thinking about AI safety, and we are starting to try to say something meaningful for a well qualified notion of "safe", then in the hyperspace of all possible future world states, that we are wanting to find that particular sub-space that represents 'healthy ecosystems with humans in it'. Ie, Healthy, happy human beings, at at some reasonable number in population. Ie, when we are considering which is the sub-space of interest, we can ask:. Is the AI 'safe' for some reasonable notion of 'safe' having to do with life, and well being, for at least some notion of human well being? Ie, is all of the future operation of the AI within the subspace of all possible states described by the notion 'safe'? To ask questions like this is to ask the very highest levels of abstraction for which the notion of error correction could even be applied. Anders: Yeah, I think there is a difference here between error correction and control. Control has to do with some value or some assumption about these states being the ones we want to be in, while error correction is to presuppose that we actually have some concept of what the rights states are in a slightly tighter way. There is a clear link here, of course. But I think that the general alignment problem and its relatives, belong in control theory. Error correction is tighter. Are the statements from the CEO being implemented or understood correctly -- that is still a kind of error correction problem. And I think it remains inside that space. But it is getting rather close to actually getting into the control theory problem:. "How have I interpreted the text I am reading?" At some point, you might say, yeah, you understood the semantics of a text well, but you did not get the poetry in your translation or something like that. In some sense, I have done now a bad error correction when I do my translation, because some meanings are not conveyed. But I think that is still an interesting error correction problem, rather than a control problem. A control problem happens when there are actual effects in the outside world beyond just us caring about the message. Remmelt: So so I feel like you are talking about:. How do these terms tend to be used? Now I do think error detection and correction tends to be used in the context of error detection and correction code, for example. And control theory tends to be used with systems that are interacting with the physical world with a more clearly coupled and observable sensor. For example, I have a thermostat and can adjust the temperature. In that way, we adjust the environment. That is a simple version of a control system. We need to be noticing certain things about what it means to have an error detection and correction method and to be generalizing of that. Forrest: That is right. What I am attempting to be doing at this point is 1st, outlining assumptions that are relevant. And then 2nd, I am comparing. This is to ask:. What are the relationships between control theory and error correction theory? And then I am 3rd generalizing beyond that so as to setup for deeper understandings. We can think about control theory as being the forces that cause convergence on a point. So for example, if in the space of all possible messages in a given language, the input to the channel represents a point, then error correction would be: Can we do something in the in the channel itself that causes the output message to converge on the same as what the input message was? We can say the control theory part of it describes the forces, whereas the error correction describes the states, assuming that you have an input state and an output state. By saying it this way, I am trying to relate back to what Anders said earlier. Rather than thinking about the topic as being about a definite state of input, and a definite state of output, that we can instead specify the limits and boundary of the subspace of things that could be represented in the output. Ie, we are assuming that can we characterize the output reasonably well. We can consider whether it is the case that at least the messages being output by the channel are within the subspace that is desirable. Error correction is to consider the correspondence between input intentions and output realizations in terms of central points and deviations from those centers, whereas control theory is more about whether the range of inputs maps well to a range of outputs. Rather than in terms of mapping points, we are thinking in terms of mapping envelopes. Insofar as envelopes can be defined in terms of points, and vise versa, then these are at least somewhat equivalent. This helps to generalize both error correction and control theory together. The net effect is to have included a notion of 'safety' as being an envelope in the output space. We can also consider whether or not the control forces are strong enough so as to cause convergence in the output, Regardless of what the inputs were. For example, if I have an input that is a little bit off, then there will also occur control forces to result in the system still be be arriving in the correct/desired output sub-spaces. This works for both for control theory and for error correction theory. Regardless of which method I use to think about this, we end up having to think about: How strong are the forces that are involved? How effective are those convergence forces? Depending upon where we are in the abstraction scale, how does the effectiveness of that convergence force change? I think you have probably already noticed that if we are in the low levels of the abstraction scale, where computer scientists are completely comfortable, and are doing things like TCP/IP, that we have really good tools for both control theory and error correction theory. And so the hypothesis is is that, If it works in one place and in one context, that maybe it should probably also work elsewhere? Yet when we start expanding into higher levels of abstraction, it quickly becomes the case that we notice that the tools that worked so well at low levels of abstraction need to be refactored to be applied again at a higher level of abstraction. And moreover, that the higher up we go, the more refactorings we will need to do. This continues until at some point we notice that actually, when we are looking at the very highest levels of abstraction the notion of control theory itself ends up being weakened by certain constraints. Then the question can become: Are the constraints affecting the maximum level of force that we could ever possibly apply so disabling that there is not a force strong enough to create the level of convergence necessary to establish a required condition such as safety? This notion of force and constraint starts to become the interesting question. It turns out that it can be shown that the kinds of things that work really well at at low levels of abstraction do not work at all well at high levels of abstraction, and that there are reasons for this. Moreover, those reasons define things that are genuinely relevant to AI safety work, particularly when the AI is 'general'. In that case, we can consider the level of force fundamentally needed relative to the level of force we have. From this, we can set up a key inequality. If it turned out that we can establish that inequality can never be matched, the net result is a proof of the impossibility any form of AGI control. Anders: I do agree that the typical error correction scheme you learn in a computer science class, that one is always working on bits and words, and then you are assuming that well, since we can build out, build up everything out to bits of words, when it just keeps on going. But that is of course assuming that it is just about having the message being identical, which is sometimes very useful, but not necessarily the relevant part. Similarly, control theory, the kind of introductory control theory that we ended up with when you take your engineering degree, is typically optimal control to keep a system at a certain point, like the thermostat, or tracking some optimality criteria. The slightly more interesting thing is keeping it inside that subspace that is regarded as safe. And sometimes you also in robotics quite often have, yeah, there is a bigger space where we can prove that if you are in there, the control mechanism is going to move you into the safe subspace. As long as you can prove that this happens, you are very happy. But again, this is a low level abstraction, because we might not, for example, know what the subspace of nice robot behavior among humans actually consists of. Forrest: Agreed. This sort of theory is great for defining airplanes. As long as things are operating within known flight conditions, that we have good control. And those controls are good for the airplane, as a system. Yet they will not say anything about whether or not the pilot is drunk. There are no "airplane design level" constraints there. For that, you need a higher level of abstraction -- the FAA and flight schools, and inspectors and managers, etc. None of which has anything to do with the safety design envelopes of mechanical airplanes. What can be done with the kind of control theory that is used by engineers to design 'safe machines' is very different than the kind of control theory used to define "safe" airline corporations. The theory of the latter is not the same as the theory of the former -- they barely resemble one another. As we start to go higher up the abstraction scale, the notions about what is inside or outside the envelope of allowable states, and where the boundary of that is, and how it is characterized, as well as the kinds of things that we would be able to say are 'forces' that move us towards the center of that characterized space -- these questions become increasingly ambiguous and also important. One of the things that we can do is generalize the notion of 'control theory' in a way that would allow us to more fully characterize these various parameters. For example, we can ask: How well are we gathering input information at any level of abstraction? We could consider this in terms of the number of bits. Yet if we are considering error correction, we can also ask things like: Did all of the staff, and the board of directors, the general public, etc, all understand the CEOs mission statement for the company? Similarly, in control theory we can ask and consider a comparison against some sort of idealized reference state. We can ask:. Was the CEOs mission statement for the company something that would actually result in an overall world outcome consistent with 'peace on earth and goodwill towards men'? Rather than being referenced a comparison of an input and output, we can ask:. Is the output within the range of sub-spaces characterized by the intentionality of the input? While 'peace on earth and goodwill towards men' is a very ambiguous state in one sense, it is a very concrete one another. For example, I just told you in words exactly what I mean. While the words are exactly the right ones -- there is no bit or letter or word errors -- the meaning is maybe ambiguous. Moreover, we do not know if it is the right level of abstraction. Maybe there is a significant difference between 'goodwill' and 'peace' as not fully aligned with the notions of 'health and wellbeing' as considered with the notions of 'long quality of life' for 'many people' -- which of these are actually consistent with "safety"? Does anyone really think that just because the sentence has finite length, just because it is spelled out in bits, that its meaning should be exactly clear? No, of course not. The bits that I use to create the words have almost no reference to the meaning. Therefore the notion of 'level of abstraction' is actually exactly relevant to the notion of control theory as applied to AGI, and moreover, it is foolish and misguided to think that the low level 'bits' version of common control theory is even relevant. So we can ask: How well can we specify the reference state? How well can we compare the future outcomes of the system to the reference state? And I notice that the minute I do that, I also have to 1st model the system, and 2nd predict the future output state. In fact, there are lots of questions:. Can I characterize the model well? Can I run the model to take input states and produce predicted future states? And then can I compare those to the reference class? And given that comparison, can that output of the comparison be used in a way to control what actually happens with the system? And is that control signal effective -- ie, can it conditionalize the outputs? Moreover, does the predicted model outputs as a model of the modulated actual outputs in the real world result in the expected future world states within envelope? So these questions become labels for the basic sorts of things that we for sure will need to do, regardless of whether we are considering error correction or control theory. Ie; at this point, it no longer matters if we are considering error correction theory or control theory generally, because our overall model of control theory thinking has become general enough to have merged them. To recap, that we cannot not consider at least, at a minimum, these specific aspects:. 1. the inputs (the degree to which the real world and/or the internal state of the system can be known internally to the modeling/control system itself), 2. the calculation of future expected states (as based on a model and modeling process) 3. the reference state that is compared to (ie, however the notion of 'safe' is considered in a way consistent with both the future predictions of the model and the actual external world states). 4. the comparison process (of the range of predicted futures relative to to the reference states/ranges). 5. the effectiveness of the output of the comparison on the actual outputs/actuators of the overall system in the real world. 6. the actual overall effects in the real world as partially caused/effected by those outputs (ie, the fact of the matter of safety either realized, (or maybe not realized)). Let’s call these the "generalized aspects of control theory". They are the 'boxes' in the diagram in which the various constraints will appear. Remmelt: The similarities here in terms of error detection, correction, and control theory and alignment, like the correspondences we see here, some of them include, there is always some kind of input channel and output channel, there is something happening in between the input and output channel where an error detection and correction code is very simple. But something's happening in it, like that is processing that. And there are some kind of comparison happening to some reference states. Forrest: We can be a little more explicit about the relation of error correction and control theory. Error correction is simpler, because the reference class is the input. Ie,if the output matches the input, it is good. Also, there is no model or prediction aspect. Those are factored out; the model is the communication channel itself, it just copies the inputs to the outputs. In that it is just a direct copy, there are no transformation involved either. Ie, there is no prediction occurring. Therefore, you can think of error correction as being a very much simplified version of control theory, with certain things basically just factored out -- replaced with an empty or simple copy version. Some of the elements just went away because they turn out to be duplicates, or null cases (not important). This shows that error correction is a subset of control theory. Similarly, at this level of abstraction, it does not really matter whether we are considering 'alignment' in the sense of 'the robot does what I commanded it to', as an input message, hence error correction, or whether we are considering 'safety' in the sense of 'robot operates within -- all robot expressions are within -- some internally defined reference class' as some 'prior specified design element', and hence within the scope of control theory. Hence, by describing the characterization of control theory in a general abstract way, I am basically also able to make a broader case set of statement about what is actually happening when general control theory is applied to AI safety. For example, anytime anybody in the AI community is saying something roughly like: "Everything in the universe is expressible in bits, and error correction is always possible on bits, and thus if we do some total inspection on of all of the internal states of the general AI system -- and we can access all the bits -- then we should somehow be able to figure out if those bits matches the bits what we want to see"... ...then I am going to start by noticing that while the words "error correction" were used, the overall statement is actually more about a version of control theory that completely ignores the notion of varying higher levels of abstraction as being relevant. And that is only the beginning of the issues. The more levels of abstraction that concepts like 'alignment' and 'safety' are away from bits the more that the force of "should be able to" fails. Anders: In some sense, a pure error correction is like an auto associative neural network. You want to set up something that maps an input to an output with high robustness. And that is in itself a useful primitive. But it is not the same thing as a general controller. Forrest: That is right. So I am limiting the scope of the argument again, because having shown that error correction is a subset of control theory, than it is also the case that anything that I prove about control theory is therefore going to also apply to error correction. Hence, I can at this point completely stop considering error correction altogether, and just focus on control theory as applied to AGI. Anders: So why even consider error correction theory at all? Two reasons: 1. Because a lot of AI safety people talk in terms of error correction theory, and so it was important to generalize it to control theory so as to maintain the notion of relevance; and;. 2. Because with error correction, it is a lot easier to notice that there are problems with high abstraction than it is to establish that same fact in control theory. The usefulness of mentioning error correction at all is that with error correction, it is much easier to think about levels of abstraction. By showing that levels of abstraction are not well constrained by error correction, then by combining error correction as a sub-aspect of control theory, I move those abstraction results into the realm of control theory. The question then becomes: When the level of abstraction is very high, can I conditionalize the outputs, or more importantly, the overall output effects, to be just in the specific required sub-spaces using just control theory methodologies? This becomes a much more challenging question, because in effect, we have to now really not only consider the abstraction of the thing we are trying to do, but we actually have to consider the abstraction of the methods we are using to do it. Notice that there is 'a means-ends' conjunction. Specifically, if I am trying to create specific abstract outputs, (ie; 'safety'), then I will also need to consider the abstract truth about the nature of control theory itself, so as to see whether it is actually adequate for the job. Anders: That particular control mode or a particular model of control theory in order to do that? Forrest: Right. There is a need for a good model of control theory that is both correct in that it applies at low levels of abstraction, and also correct in that it would apply equally well at high levels of abstraction. This sort of generalized model is needed to show that for the indefinite future that the differential of how well it works at the low level, and how well it works at the high level, is actually relevant. This is why I am started by saying there is needed an idiosyncratic way of thinking about control theory. This is part of the reason. We can begin by considering control theory in terms of sensory capacity -- ie; the bandwidth and fidelity, (which is itself a kind of error correction situation): Did the sensors deliver information about the real world to the abstract system? Did the abstract system have a way of taking those those sensory inputs and pre-computing the future implications of those sensory inputs? Did it take raw data and turn it into relevance? Did it turn it into meaningfulness? We can also ask questions about later aspects of the overall control theory model: Did the model have the ability to transcend multiple levels of scale in abstraction? Is the reference class defined at the right level of abstraction? Is the comparison defined at the right level of abstraction? Is the output of the comparison able to influence the output of the behavior of the real machine at the right level of abstraction? Is it effective at doing that? And is there a good correspondence between the model's predictions of the future real states of the world and the actual future states of the real world? Is it close enough to actually be effective as a predictive model, relative to the reference class, relative to the outcomes we are actually looking for in the real world? Notice that these sorts of questions identify a series of correspondences that indicate the things that are fundamentally necessary -- and also irreducible to the very meaning of what it means to be a control theory at all. This is a particularly important transition. We can ask: Is there any case at all in which any of these elements can be omitted? Is the characterization of control theory subsumptive? Is the definition and characterization given a good one, right for use? Anders: I always want to write things down like this. But it sounds like what you now have, it can be a rather tight thing. You need to specify a little bit how you define the level of abstraction. A lot of that is boring, but useful stuff. I think you are on to something very useful here. And so generally, yes it would be totally good with a whiteboard. But we have to make do with our imagination here, where I am actually taking notes on a piece of paper. So that is/has at least my private little whiteboard. Forrest: Do you maybe remember, when we were in Costa Rica, on the last day of the conference, when I said 'we need to talk about control theory' -- and we went down by the pool? We had a piece of paper, and I drew out for you a kind of block diagram of the control theory. What I just told you in words is the same as that diagram. Anders: And generally, I think this is a good framing too, because it is very general. You might then of course try to specify and formalize it in different ways. But I do agree with you that this is a useful thing. Now the question is:. Can we now take this to the next level? Can we find the limitations of these forms of control, and say something useful about that? You mentioned an inequality, and that makes me really interested. Forrest: Yes, but I want to pause here for a moment to just basically acknowledge something here -- Remmelt knows why I am wanting to be pausing here, because he and I have both been through this a few times. We have learned, for various reasons which are kind of just learned experience through dialogues with other people -- that we mutually need to acknowledge that we have reached a key transition at this moment, and that has certain implications. If we can move from here forward, we are shifting gears, but it can only be conditional on a mutual noticing. We have the key piece -- we have a characterization of control theory in an abstract way. The critical agreement is that the notion of control theory defined herein is fully subsumptive. Ie, there are no future version of control theory that is not some variation of what I just described. And that is a critically important point, for a couple of reasons. Consider, if I were do a proof on the basis of this description, then at some future point, I will have to be able to say something like 1st; because this description is subsumptive of all of control theory itself, then anything proven about it does actually and genuinely apply to everything in the space of control theory, and 2nd; that such assertions therefore also apply to everything in the space of AI safety, where all notions of AGI safety, and all notions of safety of systems in engineering, are based on some application of some version of this actual abstract model of control theory generally. If someone were to later try to show that they had some version of control theory not covered by this abstract model of control theory itself, or that the notion of 'safety', 'alignment', etc was at all meaningful outside of applied control theory, then the relevance of the argument/proof would be lost. Remmelt: ie, it applies to any alignment mechanism, any error detection and correction mechanism, anything that requires that kind of control feedback loop. Forrest: We are basically setting up to maintain relevance, and this is this is a crucial point, because when people try to do the 'devil's advocate' thing in the future, they normally start by trying to go after something that presupposes not having reached this point of acceptance and agreement. Anders: Well here is the thing. I would probably -- when I am analyzing this -- want to poke rather hard at this point. But the problem is, of course, that would cause us to get away from the main argument. So I am very happy to say, okay, I get what you are talking about here. I am going to delay trying to poke holes in this part. I think it is relatively tight. And I think it is general enough. So I am not too concerned about that. I am very happy to keep on going. Forrest: Ok, let us park it then. It is a transition point. It is/has worth bookmarking. Anders: It is very useful, because you are right, this is kind of a useful point for somebody trying to attack the argument to go for. It is an obvious point. Although I do think that it is going to be a tough one to deal with, because it is so general. Forrest: Yes, that is right. There are a couple of reasons. Look at it from my point of view, of what I need to accomplish. If I am going to come up with a proof that essentially establishes something about something as abstract as AI safety, and in the face of general artificial intelligence, I am going to need tools that are fully in relationship to the topic. It is not just an analytic proof -- there is a soundness component too. I have to show that the proof is relevant to the topic at hand. The topic at hand would be something like: Can we robustly show that for any possible version of any future AGI that we might create, that we can implement something that will not kill us all? Can we do a proof like that? Notice that that proof also would have to be relevant in exactly the same sense as I am requiring here. Whatever that proof is, however it is constructed, it has got to actually talk about things that we could actually build. Relevance is key; not just analytic power. I need to generalize over the class of all possible things that can be made. Anders: Yeah, it also needs to have this lower bound, because we can obviously make a very safe AI in the form of a block of stone that does not do anything. But it also so useless. Forrest: I notice easily that a mere stone will not ever meet the definition of 'general artificial intelligence'. Remmelt: We need to define that actually. Forrest: Agreed. Remmelt, you setup the notion of AGI wonderfully. You put up some posts on the alignment forum that did this rather well. You mentioned we needed to define terms, and then you setup just exactly the right kinds of definitions. Remmelt: What we mean with "controlling AGI to stay safe"?. What does 'control' mean here? What does 'AGI' mean here? What does 'safe' mean here? If we define these things more precisely, it becomes less about the kind of amorphous, ambiguous way how these terms gets passed around in the community. What is relevant in terms of what we need to include a definition of it? Forrest: And the reason for that is because we are now actually trying to say 'this is the space of topics we are talking about'. And for these definitions of these three terms, which we are basically saying are general enough to cover everything that would be of interest in that space, the space of all possible relevant meanings -- that if we can make statements about things in that space, treating that space as now formalized -- then the arguments we are making here are relevant to any future work that is done in the space of AGI safety. It has to have a soundness correspondence. We are actually able to establish future relevance by setting -- not tight limits -- but category limits. This shows up, for instance, in the definition of 'safety' as given. Notice that the way it was setup is an extremely broad and abstract definition. For example, if we wrote 'safe' by writing "does not kill us all", we can ask "What does that mean?". 1st of all, saying what something is not does not tell us about what it is. How about instead of a 'not' condition, which does not define anything, let us specify instead the positive condition that we are looking for. Ie, we need to define a positive envelope. Therefore, we say something more like: "preserves the health and well being of all life and humans". Remmelt: That actually sounds a lot more ambiguous than it needs to be. I think we can basically talk about this in terms of some kind of state. Forrest: Actually, it needs to be the right kind of ambiguous. It should be ambiguous in the sense of being an envelope. Remmelt: I got to disagree here. Forrest: When we are talking about a definition that is formed so as to create a capacity for future soundness, we are needing to be doing something different than I would be to setup for an analytic definition. An analytic definition wants to be maximally precise in terms of what it points to. However, a soundness definition wants to setup an good broad envelope, something with the right sort of coverage. And so, in the interim, we admit instead:. "we admit that do not currently have a precise definition of required concept X, but we do know/believe/assert that for sure that the any future proposed precise definition that is setup by any future workers on this term will create definitions that are at least somewhere inside of this envelope that we have provisionally setup for now". The key idea is that it covers the whole space of whatever future way the term "safety" will be defined, in a way that is largely still consistent with the way regular people will use and recognize that term, without being too overly broad and generalized so as to have lost meaning and relevance altogether. Hence the key is to get the term envelope right, rather than to be precise. It is important to know for sure we have covered whatever precise definition a future person might specify. This gives us the right level of generality and relevance. Anders: I think I am a bit reminded of some essays by Stuart Armstrong from way back when he was looking at low impact AI. And a lot of it has to do with defining 'low impact'. It is a bit like defining safety. That is surprisingly slippery when you try to define it. And Stuart has a lot of fun talking about barometric pressure, for example. If you want to ensure that the AI does not change the Earth's atmosphere, what would you actually need to constrain? Again, the possibility spaces are so big here, so you ended up with all sorts of very crazy things going on. Agreed that I do not want to specify any of that. Because I do not know today what people tomorrow might discover about barometric pressure, or the right temperature/pressure relationship, or maybe it has to have these specific gases, etc. At this point, we do not know what safety means completely. But we can set an envelope that says whatever future definition people come to will be somewhere inside this envelope. By specifying the envelope and saying, I am going to make sure that whatever proof I do covers the whole envelope, then I will have effectively covered any possible future definition people may later discover. Anders: Yeah, the problem might be the multi-dimensionality of the space, but actually figuring out how to do this definition of envelope becomes tricky. This is part of the reason why I was saying we need to be at the right level of abstraction. At a certain level, I can say the word 'healthy', and not necessarily have a predefined notion of as to what that means. But I can rely on a sort of common sense view of that, that we all share. For example; I mentioned 'humans', and I noticed that Remmelt specified "descendant from humans" in one of the posts he put up -- probably because some person poked back kind of hard. I remember that there was a part of one of his definitions that said something similar to 'any future thing that has a common ancestor with a human being'. Remmelt: It was it was even broader than that. I think someone pushed back on something and I added something in brackets to the definition of AGI. So the way I redefined general AI was 'self sufficient learning machinery'. And with self sufficient, I meant that it needs no further interaction with humans. And then in between brackets I added 'or lifeforms sharing an ancestor with humans'. That is how I tried to cover that entire space. If you rely on some kind of bacteria, then then it should not have to rely on bacteria to operate or maintain it or produce their own functional components at the time. So that is how I tried to do that to just kind of cover that. Forrest: It turns out to be a necessary thing to do. Because to some extent, there are going to be an infinite number of people that are going to push back in ways that would cause it to seem necessary to add more and more parentheticals like this. It is some sort of arms race. We try to come up with more abstract ways to cover more of the space in a more general way that has the right sort of coverage -- one that establishes the soundness aspect, so that when we get to establishing a result with an analytic aspect, that it actually becomes relevant. Remmelt: We can also just simplify it a little bit and not include some of the things that are relevant to safety. This is something that has been a conscious decision. That makes it easier to explain and to say, it really definitely holds for this precise case. Look at this, you can just go for the steps, but if we want to be comprehensive enough about this it would have to cover this wider envelope, but if we are more narrow about it, too narrow in terms of our definitions, we might talk about something as safety as, at the very least, we are talking about humans continuing to exist. and ancestors of humans. So there is something about humaneness. Like certain kind of configurations in terms of how our bodies are configured that needs to continue to exist. And it is something about how body functions that needs to kind of allow those body bodily components to continue to exist and to function. Forrest: You are putting your finger on the nub of the matter. If I make things too precise, I lose generality, and also relevance. And if I need more relevance, I also need more generality. I have to have the right mix of preciseness and non-specificity. And that looks like a contradiction. And by the way, literally everybody that is in the conversation on AGI safety are neuro-divergent. It is just going to be the case that anybody that is interested in this obscure topic is probably on the spectrum to at least some degree. And if you are 'on the spectrum' then there will be this fascination with details. That is just how our nervous systems work. There will be lots of nit picking, and also the net result of being really reluctant to generalize anything over anything, or everything. Like this is something that goes so against the grain of people in our part of the spectrum, that I have to specifically spell this out as a tendency that we will miss the big picture if we are not paying attention to these sorts of issues. No matter how things are defined, someone somewhere is going to be unhappy in some specific way. No coverage is detailed or precise enough, and if you make it more precise, the more that some people will complain that it is not the right precise, not perfect, or that it should be some other, non-overlapping, disjoint precise meaning. No one will ever be satisfied with definitions. We can remain in that forever, never advancing. Or we can set a rough envelope that is good enough for the abstraction level we are working at, and then move on to use that as it is, as formally correct, for the envelope. Anders: There is also an interesting issue with the LLMs that have demonstrated that actually, sometimes you can get a surprising amount of generality out of a system that was not entirely built for it. There is this shock that LLMs actually have been as successful as they have, for at least a few steps of general reasoning, although they might not be general reasoners or anything like. But they seem to be understanding and suitable in a surprising number of general concepts. They actually seem to be grokking some that is making a relatively proper model of many of these concepts, without having been explicitly [inaudible]. And that is kind of landings on hope that we might actually be ways of making the surprisingly general models and explanations and definitions in this space, even though this call might be too short to do that really well. Forrest: Agreed that sometimes unexpected generality unexpectedly occurs. I think we are ready to move on. Having outlined that control theory model, and basically made the assertion, that you can then check later that the notion of control theory given, as having those named aspects, is the right set of 'necessary' and 'sufficient'. It would normally be a homework assignment to give to a graduate student: Try your best to think of something that does not fit in this model that can still legitimately be called 'control theory'. And if you fail at that, after sincere effort, then please let me know that you have accepted the proposition. Therefore, from here, I am in advance presuming that you will do this exercise and thus at some future time, get to the point such that you will have convinced yourself of this truth. By this, I am forestalling a whole category of what might be called 'Devil's Advocate type arguments' that you can now have with yourself to test this proposition. Anders: We are leaving that for later. But we have essentially a proposition that there are ways of making fairly general definitions of AGI and what safety means, etc. Forrest: Agreed, so I am gonna skip all that. We can just take that for given so we can move on. With the general definitions of control theory, and the recognition that these notions also cover the application of the general definition of control theory to the notions of AI safety, that we can start factoring out control theory too. Now we can move to an even simpler series of generalizations, but with the maintaining of a specific sort of envelope relationship between each one. This is important so that when we set up the inequality relationship, that we are going to be able to factor that back through all of these transitions that we have made along the way. This is consistent with what I said I was going to do in the beginning. We are making transitions of starting with certain ideas and then generalizing and then formalizing. The next major area of proposition is that there is no instance of control theory that is not itself a subclass of the more general concept of conditionalization, or causation, or both taken together. There is nothing that I can do in engineering that does not depend upon causation. And anything that I am doing in engineering is essentially an application of control theory or causation. This is also true for computer science. Every form of algorithm is essentially a sequence of conditionals. It is a sequence that can either be thought of as rings or conditionals, ie flow of change of states, and branches of flow of changes of state, some of those branches may be backwards. When we are looking at it, we are basically saying that all that is knowable within science and technology, all that is applicable within science and technology, is effectively within the scope of what can be done with causation. If it cannot be done with causation, it cannot be done in science and tech. Moreover, it cannot be done with control theory. Control theory is strictly dependent upon the notion of causation. Anders: I think David Deutsch would back you up here. He would basically argue that all good science and technology is about explorations and explanations, when you start looking at his definition of that it sounds awfully much like control theories. They contain a model, but they also contain an idea about where you can modify the system given the output from that model, and then you can control it, or at least influence in a useful away. Forrest: This is true. I am a bit familiar with that work. In a certain sense, this is a simpler and stronger statement: that the notion of algorithmic process -- all of it, point blank period categorically -- is an application of causal theory, causal modeling. The notion of an explanation as a combined description and a causal model, and an application, a potentially. This is the difference between science and tech. Science is the observation of symmetries of causation. Technology is the application of symmetries of causation. The entire field of engineering -- including algorithmic process -- is, in every single step, every single motion, every little transition -- all of it -- again, categorically, is essentially an application of causation. Control theory is a subset of causal theory. This last is another one of those things that I am going to set aside and leave it as an exercise for homework: Can you find any counter examples? And at this particular point, if you are thinking about this with any discipline, you will probably say "no". For now, I am going to forego that exercise. From here, I am going to proceed as if I already know this is true, and we are agreed. Anders: Well, I think I have been in the philosophy department too long. Thinking now, wait a minute here. And the obvious thing is you want a control system, of course to cause some things in the world. However, I can see a a way where it might be complicated. That is when you get into a probabilistic cases. You might actually have a stochastic environment, there might actually be things happening that are not necessarily even caused inside the system you are modeling. It might be outside or have no reason whatsoever. Forrest: Does not matter. Anders: Yeah. Because basically, what you still want is that control system is presumably doing a clever gamble. So there might be a very rambunctious system. It has free will, it has mysterious... Forrest: Factor this out. Anders: But a good control system will still do whatever it needs to have a high probability of getting whatever the control is aiming at. That is right. So all I have to do in order to basically subsume everything that you have just said is say the notion of 'causation' is not about definite inputs and definite outputs, as it is about 'shift of the probabilities of the outputs' by shifting the probabilities of the inputs. If there is any statistical deviation at all that is regular, then the... Anders: And regularity is essential here, because if there is no regularity, okay, then we cannot speak about causation. But then, of course control does not seem to be very much... you cannot do much with it. Forrest: To the degree that there is a regularity, the observation of that regularity is the hypothesis, or the statement that is the causal statements. If-this-then-that, to within some degree of probability. Or if I shift the probability of the inputs then I have shifted the probability of the outputs. I am looking at this input/output relationship, if-this-then-that, these preconditions lead to these outcomes. But it does not necessarily need to be super precise, it just needs to say that there is some regularity of those relationships. The notion of the regularity of these shifts in the probability field, the observation of definite shifts in the probability field is the notion of causation that we are using. So in effect, I have generalized the notion of causation to now cover everything that you mentioned, and then some. So again: Any counter examples? Anders: I think that would be probably tough to find easily. But yeah, well, I am gonna think about it. The power, of course, about speaking about it like this is now it covers an enormously wide set of things. So one thing that actually makes control theory relevant is that there is an objective of some kind, that sets it apart from other systems that are just interacting causally in some sense. That objective is actually a really important aspect. And then, of course, as we have earlier discussed, defining those objectives might be slightly tricky. And indeed, we might need to do it at high levels of abstraction, which does not bode well necessarily for always making able to make it accurate. Forrest: To go back a bit, we can start by looking at each of the things that are necessary for control theory, and start with noticing -- making observations -- about each one. For example, we can consider limits in the amount of bandwidth that is available for us to sense. There is entropy associated with the world that we cannot constrain. There is entropy associated with the sensors which we cannot constrain beyond a certain point. Ie, there is a Heisenberg Uncertainty built into the very nature of just even seeing things at all. In regards to control theory, We can setup to ask those questions which identify limits. Where given the limits of sensory capacity, and where given the limits of modeling capacity, and the limits of model specification, (which is where our causation is essentially represented in some abstraction that is amenable to compute), that we can basically ask: Can we specify the goal state well? How well can we characterize that goal state? Does that characterization turn out to be hard to do because of abstraction, or maybe information limits? If I do have some abstract idea of "the desirable goal" (what is meant by safety, etc), then maybe I can translate it into a concrete state of lists of Boolean conditions or numeric relations or some other measurement relations of one sort or another? However, I notice that no matter how I do that translation, I am still going to need a certain amount of data to represent that output state. So in relation to that, I might also ask:. Do we have the capacity to store the necessary amount of information? For example, it might turn out that we do not -- maybe we just do not have hard drives that are that big? So overall, we can look at each of these individual things associated with control theory, and we can start to model the degree to which the cascade of these limits -- of each one of these named 'control theory boxes' contributes to some variance in the degree of the statistical distribution of what the range of possible output states -- what we are actually going to be able to do, using control methodologies as a process. In this way, we can start to model the strength of the constraining forces -- how much they will make a difference in the actual field of the world. Thus, we can start to set limits on what we can do using control methods. For example, if we were to assume that somehow, elsewhere, we had already been given some idea of the strength of the force minimally needed, we can ask: What is the maximum level of statistical shift -- the maximum strength of the force we can muster -- relative to the strength of the force needed? As I mentioned previously, in this particular presentation -- just to thumb-tack things and bookmark stuff a bit -- I am not going to attempt to set numbers or values on the minimum level of statistical distribution control that would be required for AGI safety. I can do and establish something like that in another argument but that is not what is here for us to talk about today. Remember, Linda asked about:. What is going on in the space of control theory? In this conversation, what we are doing is we are attempting to outline the basics of control theory in a form amenable to making observations about what can and cannot be done with it in relation to abstract topics like AGI safety. By establishing that a certain inequality exists, we can setup to describe limits of what can happen in the future. However, in this conversation, I am only exploring one half of that inequality. Herein, we are considering only the limits of control. It is to basically ask:. Is the maximum level of control possible at least greater than the minimum level of control necessary? As per our agreement, I will not attempt to setup some notion about the minimum level of control necessary. To do that, I would have to switch to talking about something other than control theory. Setting minimum requirements is something that is part of the world -- It has to do with questions like: What we mean by safety? And what do we mean by AGI? And what do we mean by control theory? But in this discussion today, I only going to be considering the control theory question. We are setting up how we would assess the inequality, but we will not today get to all of what is needed to be a proof that the inequality is violated. Anders: Yeah, yeah. But it sounds like we might want to move towards understanding what the value is at least on the left side here, the maximum amount to control, what that would be like. Forrest: Yes, agreed. That is all we are attempting today. I am setting limits on the maximum level of control. And that is the left side of the inequality. In this particular conversation, I am leaving the statement of the inequality itself as a kind of generalization of what we are trying to do overall in setting up a proof of AGI non-controllability. We are agreed to leave the right side of the inequality completely unspecified for now. Though perhaps you may remember, when we were in Costa Rica, that I started to set up for that. Unfortunately, we did not get very far then, simply, I think, because of the nature of that conversation. There were a lot of other people involved, and you were trying very graciously to make sure everybody was kept up with us, which was a very nice thing to do. Unfortunately, that also meant that you and I did not get as far as we could have in exploring what sorts of things affect the right side of the equation -- ie, to ask: What is the minimum level of control we need to have? At that time, I was hoping to do the left side of the equation at some future point -- what has become today. I made the mistake, in Costa Rica, of thinking that you were more more likely to be interested in the right side than the left, and so I started there, then, and I turned out to be wrong. My apologies for that. But the main thing here, today, is that if we are basically saying, for each of these different aspects of the overall dynamic of control, if we show that the overall dynamic of control is modeled in terms of these named and required boxes, or aspects, and we also notice that these aspects fully cover the basics of all of control theory, then we can examine each individual aspect and ask: What is the maximum capacity in that box? Then, by combining these, I can say something like:. 'The maximum level of control is effectively the product of the level of controls that are available in each of these aspects'. By combining the error bars through the equation that defines the overall system, what ends up happening is that I can therefore say something like:. 'The maximum level of control ultimately possible is going to be in proportion to whatever is the minimum level of control available in and within any one of these required aspects' -- ie; as specified by whatever the weakest link in the overall chain of conditionalized process. Anders: Yep, yep. It is interesting also to start thinking now about these boxes. Some of them I have some inklings about what kind of bounds you might put on them. For example, you have internal model modeling principle, also known as law of requisite variance. That is one of the obvious limiters on the model part. Forrest: Agreed. This is where you and I start coming together. When I look at this, I basically notice: 'Wow! We can find all sorts of things in every single one of these specific boxes that we all know is absolutely the case -- we all know that these are real limits'. And while these limits might not necessarily be required to be thought about very often, in most fields of common engineering practice it does turn out that these sorts of limits are sometimes more relevant than expected. When in the case of considering AGI safety, they are turning out to be very relevant. This is the conversation that I have had with Remmelt any number of times, where we have gone into some of these details. What we are doing is to take a look at each box, and ask, in this box -- in each of these five -- that there are things are obvious to me, and which I notice that I would expect would also be obvious to others -- Ie, to anybody that knows anything at all about control theory also knows about these too. But they need to be enough of a specialist of thinking about control theory to even be aware of that possibility. However, with practical engineering type people, when I assert that 'there are these theorems, etc, and these theorems say these things about this topic, and this applies to this box, etc', while in time that it becomes obvious to everybody, in the short term, people maybe think it does not matter. Yet it does, once certain boundaries are pushed. The fact that you have a background in control theory means that you are able to help. Anders: Yeah. So now let us see what we can say about this left hand side. Forrest: This is one of those places where I think I can skip a lot of the details because you probably know them already. There are limits on the amount of information that we can receive from the world on an absolute level, as per Heisenberg Uncertainty limits, and bandwidth limits, etc. We can also ask: What is the sensor capable of doing? We can consider things like impedance matching limits, and there is all sorts of stuff like that. When we are thinking about the modeling side, as you mentioned, we notice questions like: is there any relationship between the complexity of the model and the complexity of the world? Is that even in the same ballpark? We both notice that there is some sort cybernetic principle that is applicable here. Then there are questions like: Do we even know the model? Have we gotten to the place where we have identified that the model is correct? Some people will try to assume that the artificial intelligence will learn everything there is to learn about correct models. And maybe we could get to different better models that way. So if we are to assume, for the sake of principle, absolutely perfect models -- ie, the very best possible ones -- then we are now we are just back to things like the cybernetic principle as the limit. So we can move on: What about the goal state specification? A lot of people have worked on that question. Some have even made provisional proposals. My overall feeling is that a lot of people who have looked closely at this basically said, 'actually, this is a really hard problem'. Many of them have pointed to some specifiability limits just in terms of the degrees of abstraction required -- which I have already alluded to. Then we have the comparison problem:. Can I compare the outputs of the model which predict the future state to the goal definition reference class? While most people would think that this is not a problem, it actually turns out that there are a number of very interesting sorts of things we can look at here. The sorts of things that most people would not think about that happen apply here:. Is the model calculation actually faster than the real universe? Anders: Ahah. Forrest: If the answer to that question is "no", then you do not get to do the comparison, because it is not ahead of time. And therefore, your output influence does not actually intercede so as to prevent the bad thing from happening, or to try to constrain the good thing to happen. Hence, we notice that we now have to think about the energy coefficients associated with computing the model, relative to the same energy coefficients associated with 'computing the universe' -- or at least the relevant portion of the universe, or even just the relevant portion of the model. But either way, we are now back to a tighter different version of the cybernetic principle, with additional constraints in terms of things like memory, bandwidth, time, and energy. So now we are required to actually acknowledge that there are now things like actual hardware limits. Moreover, in addition to those sorts of questions, we can also ask questions like: Does the Rice's Theorem apply? For example, maybe we can maybe sometimes declare that the goal state specifies at least something, that there is at least some characterization of what is wanted for to be achieved as outcomes. Additionally, we may also notice that the concept of 'prediction' is effectively to require the prediction of what essentially becomes an algorithmic model of the future. At which point, we are basically asking something like:. Can I characterize the future output of an algorithm -- or the future performance of an algorithm? This question turns out to be associated with a generalization of the halting problem -- something which is called Rice's Theorem. And it turns out that, for very deep computational reasons, you just cannot do predictions of this type in general. And someone could may be ask:. Do we need to do it 'in general'? We can solve specific such problems all the time! I have had a lot of people quibble about this. And every single example that they give of solving these types of problems is a specific case situation, by definition. That is good and all for most situations, except when we are considering the implications for general artificial intelligence. By definition, and also in easily predictable ways, I can anticipate that I will not ever really know what specific algorithms are going to be involved with the future of the AGI. There can be no question that any future AGI worthy of that name will have the capability to implement any model, any algorithm, that it may elect to incorporate into itself, either directly by inclusion into its runtime code, or even just within its own process of 'thinking', however it does that. So you actually have to now use the general class -- any possible algorithm, not just some limited subclass of possible already known in advance algorithms. If you are allowing your presumed future AGI to be able to arbitrarily improve its models, then you are also stating that it can arbitrarily improve, generalize, and extend in unknown ways -- maybe incorporating anything at all -- into its future processing -- what we would have to now model and predict the full scope of what it could potentially do. And doing all of this, predicting the future, is non-optional, because what is being considered is ultimately a safety argument -- some type of commitment, overall, as to the limits of scope of behavior of what some future unknown algorithm will do. Ie, the safety argument is actually relative to the whole general class of safety arguments, and not any pre-selected specific thing at all, that anyone wants to happen to pick as their 'hobby horse' preferred plan and proposal. Anders: Yeah. This is probably where I would be quibbling when we get over eventually to the devil's advocate part. But yes, I think I see your point. And I think Rice's Theorem is a kind of a rather powerful tool here. Forrest: The Rice theorem is a powerful tool, especially in this context, and it is an unexpected one to be applied in this space. And here is the delightful thing: given that you are one of the smartest people I have ever met... Anders: Oh thanks. Forrest: ...I would hypothesize that almost all of the stuff that you would maybe present as Devil's Advocate in the future, that I can almost fully forestall now. All I have to do today is basically to say: 'here are the range of tools that I am using. I instruct you (as a general intelligence) to create a model of me that is going to use those tools. Please run your devil's advocate on that model of me that is using those tools, and thus, with your model, you can then think about and predict what my future responses would be using those tools'. Hence you would be able to see and predict whether or not you can come up with any exceptions to my arguments. Let me know the results. That would, of course, save me a lot of time. So therefore, I do actually set it as a homework assignment for you. It is an practice example of the argument, and also of any proposed refutations of the argument, that something like this should be completely possible. And it has practical advantages for me, since by this means, I have reduced the bandwidth that I have to process emotionally to field the devil's advocate process -- which by the way, I am not very good with. Hence, the more I can offload that work, the better. Anders: I will see how good my model actually is here. There is again this problem that you are pretty smart, too. So actually running a model of you might tax my somewhat limited brain. And there are really interesting things here, which we should probably leave for much later, about when you have rational agents modeling each other, and the kind of the complexity that necessarily ensues. Again, Rice's theorem among a lot of other stuff shows up. This can kind of cut both ways. This is making the problem, both for safety, but also against safety much more complex. Forrest: Exactly. Well, the thing though, is that I am now going to deploy another tool. Again, one not usually thought of. But as soon as it is thought of, it becomes an immediate observation of 'oh, yeah, obviously, this applies'. You just had an experience of 'yes, I can see the Rice's theorem applies in this situation'. And all you had to do is just say those magic words, and I am like, 'great! now I can move on'. No more effort on my part is needed -- to me, the downstream has become all automatic -- just like with artificial intelligence itself. Another thing that we have noticed as being applicable in this space, which is also a little bit unexpected, is that there is a theorem in mathematics -- I remember it as connected to Gregory Chaitin -- and by the way, I dug through the internet for something like 10 minutes just trying to find anybody in the world who would be willing to pronounce his last name. Of course it is easily found printed everywhere, but if you actually want to hear some person saying it, with him there, that would be corrected if you got it wrong? That, like, never happens -- everybody, they all skip that. They all said, "welcome Gregory". Anyway, based upon work that he has done in the field of algorithmic complexity, that there is this theorem which basically says 'it is not possible to know for sure if a simpler version of a given algorithm exists'. Why is this relevant? Let’s assume that I have some algorithm, and I need to predict the future of what that algorithm will do. I am attempting to use one algorithm to predict another. Hence, I need a lower energy version of this algorithm -- something that needs less time, or less space, or less energy, or is somehow 'simpler' in some real way -- because that is what 'simpler' really means. It has to be the case that the complexity is lower, particularly in the sense that -- in terms of actual hardware bounds -- my improved algorithm requires at least less of at least one of those. And the 'less than' is usually preferably the 'time' one. This is important because the time factor is the one factor that you cannot factor out. This is because of yet another series of theorems that basically say that some algorithms cannot be de-serialized -- ie; made more parallel, beyond a certain point. As a computer scientist, I am well aware that there are things well qualified logical arguments that most algorithms are inherently sequential to at least some minimum irreducible degree. The net effect of this is that it sets limits on the minimum amount of time that you can go from inputs to outputs for that algorithm, no matter what. Ie, even perfectly perfected infinite AGI -- God himself -- cannot fix this problem, and still have the notions of "a lawful causal universe" still even make sense. If you are allowing the AGI to be rejecting of causation, then you have also, of course, given up on any notion of 'control theory', let alone any compelling argument that that AGI will be 'constrained to be safe'. Constrained by force of causal logic? Either it is so constrained, or it is not constrained, but the nature of 'reasonable argument itself' requires that we be consistent about this. Hence, the bounds of 'reasonable' and 'argument' limit us to consider only the bounded control theory cases if we are wanting to be consistent about 'safety' too. So therefore, there is the minimum amount of time that nothing can do better than in regards to predicting the future. Therefore, any control process based on such prediction methods, will also have actual irreducible finite limits. The question is the relation of those limits in regards to comparisons with the actual world. Basically, the actual physical world is running some version of that algorithm opaquely. Maybe, worse, it is even running some simpler version than you currently know about -- and you cannot even know if that is the case. If it is running some simpler version, you will not know that, and cannot know that. Hence, if you are looking at an algorithm which actually does attempt to correctly predict something about the output future world states, and you also want to do better enough than the real world, so as to actually be in advance, in time, so as to anticipate what the actual world is going to do, the odds are against you being able to succeed. Is your optimized hardware, actually running in the universe, ever really going to be faster than the real universe running itself? Any virtual truly simulation of equivalent real complexity to any real actual physical thing intended to be simulated will necessarily be slower than the real thing itself, by virtue of the very fact of the simulation embedding. Wherever we are assuming equal fundamentally minimum serial dependence complexity in a process or algorithm, 'virtualization' cannot not add more energy and time. The only three reasons why anyone (or anything) at all ever prefers to pay the high price of 'doing a virtualization': 1. when the model is vastly simpler -- ie, less serial dependence overall -- than the thing being modeled, and hence faster and predictive. 2. when there is needed some specific flexibility that only an interpreted virtualization context can give. 3. when some compelling platform entanglement confers some type of irreducible power over that function which is virtualized. In regards to control of future AGI to be safe, all three of these virtualization reasons apply. And so in this particular sense, there are two problems. The first problem is there really is not any way to establish that any given algorithm is the simplest possible version of that algorithm. And then, there is the problem that no way to convert certain algorithms into more parallel and less sequential versions than some actual absolute minimum. These two issues are in addition to those also indicated with the Rice's theorem, insofar as they set different additional limits on what is maximally possible with control theory. Also, just for clarity, I am taking as given, the proofs of these three as included. Ie, I have read through the proofs, and I personally think they are airtight, and so feel ok using those results in my own work. You can either take my word for it or you can go reference it. My feeling overall is that the net result is that some things simply cannot be predicted. They are inherently unknowable, even though some people might claim that "in principle" you should be able to do "almost nearly anything" with model-ability, control system logic, or similar. Anders: There are a lot of very dense ties to the theory of Kolmogorov complexity and Solomonoff induction. Yeah, again, really rich and wonderful set of conclusions. Forrest: Yes. People know about all of this already. All I am adding is that they are relevant to control theory, because they define the limits of what modeling can do. And modeling is one of those boxes. Anders: Yeah, there is a quibble here for later about that. For a given algorithm, of course, you always have a bit of a finite problem, that has a given number of bits. And in principle, you can just search through the space of all algorithms shorter than that, etc. Now, in practice, this is rarely interesting and relevant. Forrest: What you cannot do is show that there is a correspondence between the simpler version and the version that you simplified. You have the original version, and you have maybe simplified version, and perhaps you can say that for some finite set of inputs and outputs, it does the same thing. The real question is more like:. How are you going to prove that those two are strictly equivalent for all possible inputs and all possible outputs? That turns out to be quite a bit harder. How do we show algorithmic functional equivalence? Anders: And it is a little bit like proving that an algorithm holds or not. The general halting problem holds, but for certain subsets it does not hold because we can actually easily eyeball certain programs and find... Oh sure, yes I agree. For example, anything that does not have a ring -- we can tell immediately, by inspection, that it is for sure going to halt. Anders: Yeah, so now I am just foreshadowing an interesting and probably long running the discussion to have later. And that is the worst case versus average case performance. This is one of my hobby horses this year. That matters quite a lot in computer science. However, I am happy to have Chaitin's theorem here on the table, because I think it does show us something important about the complexity of what we are trying to do here in control theory. Forrest: Yes, because the theorem that talks about, Can we find a simpler algorithm? That is one that applies in the general case. Ie, we do not really have very many exceptions of a claim 'we found a simpler algorithm' that is equivalent in some provable way. Rather than it being the case that for the very large class of day to day practical problems (algorithms) that we can very easily tell if it halts or not, and moreover it does not matter, since we can simply unplug them whenever we want. In the practical case, the halting problem is almost never actually relevant to real people. However, with the 'no provably knowable ways' to be able to establish a simpler algorithm exists, that applies far more often to practical problems than not. Every algorithm we are actually using is usually the very best one we currently know, and simplification reductions is something that happens actually really rarely, and mostly by accident. So that is much more defining your average case -- most of the time, AGI probably will not actually come up with vastly simpler or better models -- that its best predictive capabilities may not actually be all that much better than those we are already using. Remmelt: You are referencing this notion of sometimes things just have to be serialized in terms of, there is some kind of process, some kind of propagation of effects in the environment, for example. So there is some kind of serialization there. And you cannot just remove parts of that or simplify parts of that, and have it have the same consequences. That reminds me of the notion of P versus NP, of non-deterministic polynomial time algorithms where there is that kind of similar idea. You could try and model this and what are some property of this, this process that you are trying to model in this case, it will be within the computer, but we are actually talking about something that will happen in the environment, but okay. But there is this problem where with non-deterministic polynomial time kinds of algorithms, or in this case some kind of process that is actually not quite a computer algorithm, you have to kind of go through all of these side branches in parallel. Some of those branches might be shorter than all of the branches. For example, some of the side branches might just be three steps. Whereas like, normally, you would like have to reach 20 steps in order to get it. But you are not sure whether that branch of three steps is actually going to properly model the outputs of the actual process. So then you are back to that kind of causal question, what is the probability, which of course we do not know, but you are gonna have a sense that that probability is actually quite tiny, that the shorter branch will actually be able to predict the output -- of some kind of property of the outputs. Forrest: That is right. And here is a way to sort of work with that. As a way to think about this, let us propose a special case, just to see some of these dynamics in action. The algorithm I am considering is essentially to start with a bit field of numbers, and then to use a good hashing algorithm, which will take that bit field as an input and it will generates a new output bit field. Let us further say that there will then be recursion -- that hash function runs on that output bit field again, as an input, to generate another bit field output, etc. Overall, we have this instance of a bit field, and then another instance of a bit field, and then another instance of a bit field, etc. Notice that every bit of the output bit field depends upon every bit of the input bit field. This is simply because of the hash function -- that is what they do. Then the question can be asked:. Is there any simpler way to calculate any of the future bits at any point in time? And we would notice that, generally, probably not. And we can be rather confident about that. Because if there was a way to predict any of those future bits, cryptography would fail. Given that I just described the blockchain algorithm, there is every motivation that literally all intelligent software person has had, that the smartest people in the world have tried to figure out some shortcut for this sort of algorithm, because if they did, they could hack Bitcoin and thus become instant billionaires. Hence, we can see that from a financial perspective the smartest and the brightest people of the entire world have looked for any shortcuts to simplify the hashing function process. The net result is that anywhere we have a situation where every bit of the input bit field defines every bit of the output bit field, and where there is some sort of recursion on that, that the energy required to calculate the future state is not going to be less than the energy to calculate at least each hash function serially. And it can be argued that the very essence of complex systems, is to be exactly that way. Ie, real algorithms are more often more similar to this template than they are not similar to this template. Ie, in the average case, there are no shortcuts. There is no way for me to predict the future bits of three generations ahead of the actual hashing in any faster way than just doing the hashing. We have an algorithm which is, at least empirically, not simplifiable. Anders: As a parenthesis, this reminds me of the classic paper by Impagliazzo, the Five Worlds, I do not know if you are familiar with that. Forrest: Yes, I know of that paper too. Anders: Basically outlining these different levels. And the question is, which world do we believe we are in? And I find it worth noting that that paper is from 1995. And it is still kind of standing. I do not think anybody has, because we have not moved very much. No, we actually do think that we are in 'cryptomania' or maybe in 'minicrypt', but very few people actually believe that. Forrest: I have some strong opinions about the implications of that paper. But I am going to park that. Because that is a completely separate argument, and will have nothing to do with Linda's request. Anders: That sounds like a good future conversation. But recursing back, but also doing a little note here, claims about the world that are relevant for this bigger argument are interesting, because they are of course cruxes that are very good for checking, okay, does it depend on any particular assumption about the world? Now, I think so far, none of the things you mentioned are particularly dependent on any particular controversial take on how the world works. Forrest: There are two paths here. One is that I can try for a more powerful, more general, version of this argument which has more risk of pushing things into existential crisis, (which I am hoping is not necessary). And there are less robust versions of this argument, which are fully realizable within -- as far as I can tell -- non-controversial opinions. Anders: Okay, that on its own actually hints at something interesting for maybe later. But there is a kind of frontier in this kind of diagram about how much extra assumptions you have and how strong the argument is. And you can move along this frontier. And that on its own is also useful for kind of finding cruxes and investigating, but yeah, let us leave that for the time being. Forrest: The cruxes would likely be more available towards the more abstract end. Part of the reason that I am factoring out large categories of things is to also show where there are no cruxes. So abstracting can go both ways. For instance, we did not have to go box by box. I could have basically said something like, notice that the notion of control theory is critically dependent upon the notion of causation. With that basis, I can then ask a very reasonable question:. What are the limits of causation? Now, this is where we pass a certain threshold. Because there are large categories of people who sincerely believe that there is no such thing as a limit of causation. At which point, I now have to show them that they have already accepted, and long before they met me, that there are limits to causation. One set of limits of causation is called general relativity. Another set of limits of causation is called quantum mechanics. Okay, take your pick. Either one of those is going to set very definite limits on what is possible with causation. And if you want to quibble that those are limits of causation, I can show a direct one to one correspondence that that is in fact an absolute fundamental characterization of what those theories are. And this can be done in a very non-ambiguous, completely non-controversial way. I do not have to worry about... Anders: I do not have a quibble with the existence of limits of conception as given by physics. And there is even the interesting question about whether it might be possible to argue that assuming limits to causation, maybe we do get some physics out of it. But that is probably a separate speculation. Forrest: That is also separate conversation, and one, which I will probably defer. Anders: I think that is/has where we bring in Scott Aaronson to figure out things. Forrest: The effort is to show that the limits of causation are showing up in all of these different boxes. So for instance, we have talked about some limits of signal sensitivity, and some limits of modeling, and some limits of the representation of goal states. And we can talk more about limits of comparison. For example, most people pretty much assume that comparison is a fully reducible act, and that there is no ambiguity in comparison. Anders: Yeah. But actually, yeah, there are some interesting complications here, too. Forrest: If you are wanting to find cruxes, especially, yes. We can go to yet a higher level of abstraction. Remmelt I think you and I have had this conversation at least once or twice, but I am not sure. We talked about the Incommensuration Theorem? Remmelt: Yeah. And I have looked into this. I have got a rough sense of it. Forrest: Okay, so earlier, when we were talking about causation there was mentioned in passing that causation is a repeatable regular shift in the pattern of the distribution of events, comparing what might be antecedent conditions to what might be consequent conditions. So 'pre-conditions' and 'post-conditions'. And I noticed that when we put it in a particular form that to talk about causation is effectively to talk about symmetry. Anders: Interesting, yeah. If I set limits on what is possible to do with symmetry in the epistemic field, then I have set limits on what is possible to do with causation, and therefore what is possible to do with modeling, including comparison. Anders: That takes a bit of unpacking. But, yeah, you do get the tight links here. Forrest: Yes, the links are very tight. The more you dig into them, the harder they get. Eventually, there is a cascade effect that goes all the way back down the chain. From the analytic, we see that there is relevance. If we say 'there is a limit on causal process', we are essentially also saying something like 'there is a limit on epistemic process'. Ie, there are things that are inherently unknowable, not just unknown. And a proof, incidentally, is a kind of phenomenology in the relationship between the known, the unknown, and the unknowable. For example, some things cannot be proven within a given axiom system. Ie, that something would be unknowable with respect to those axioms. Remmelt: I am gonna just try and throw in some analogies, just to give a sense of what you are talking about. Here is the more applicable versions of what you are talking about and in terms of the Incommensuration Theorem. For example, there was Godel's incompleteness theorem, where there is this notion of consistency. And you can see that consistency is something like symmetry, that there is a correspondence between something being consistent, and existing some symmetry, and completeness. So there is some way where you cannot have a formal axiomatic system, in this case, we are talking about something that is highly regular. But the point still applies. You cannot have it both be complete in terms of being able to explain everything, and be consistent and definable at the same time, so that it can actually make those statements. Anders: No, I think you are pointing at the right thing. I think you can probably make it much simpler examples, too. If A and B do not have a causal connection, that of course means that I cannot actually get information about B from observing anything A is doing. Because if there was, I would have actually be able to find that some course of it, even though it might be happening through supernatural magic, or whatever some really weird physics. But you get that this connection of that is powerful that I think an epistemology that accepted that you could have a causal... Traditional philosophy would say you can get divine revelation. But unfortunately, that just means that the divine is now acting as a causal bridge. Actually, not even a medieval scholastic philosopher would say that you could, if something was truly independent, you could have knowledge. So no. Forrest: Exactly. The Godel Theorem points to an epistemic limit. Relative to a particular axiomatic system, certain truths are not knowable. And that sets limits on the consistency, or on the completeness. To examine the meaning of consistency, consider where I start from some given set of premises. Maybe I derive, through some path, some assertion. If I start with those same premises, and I use some other path of known transformations, of accepted transformations, and I arrive at a contradiction to that assertion, that would be 'an inconsistent system'. In this sense, consistency is a symmetry principle. And if I cannot -- even in principle -- get from the initial conditions using the known available transformation rules to the thing I want to show, then that is 'an incomplete system'. There are unreachable statements, whose truth is not knowable relative to those premises and transformation rules. The Godel theorem is essentially showing that you cannot have symmetry and continuity at the same time. The underlying idea is that if I look directly at the fundamental notions of symmetry and continuity, I can show that this is true just in those terms directly. That if I define the terms symmetry and continuity exactly, in a way that is grounded in the notion of comparison, which is essentially shown to be isomorphic with the notion of measurement, itself isomorphic with the notion of causation, then, in effect, by showing that there are limits in the relationship between symmetry and continuity, I can effectively also showing something like why general relativity is the shape that it is. It is an epistemic limits of sort. We see this in the fabric of mathematics in the form of the Godel theorem. And we see it in the fabric of physics in the form of the Bell theorem, and all their analogs. We are pointing to a fundamental epistemic limit, bound to the notion of epistemics itself, ie; the notion of 'to know', as connected to the notion of comparison and the intrinsics of comparison. I am alluding to all of this, but there is an argument underneath this which is actually quite simple and shockingly powerful. But it was not my intention to try to describe all that today. But I want to allude to it because this goes back to an earlier conversation. I do not know that you remember this -- when we first met, and we were working on the 'Dark Fire' paper, that towards the end of that conversation, in one of the last emails, I mentioned to you: “Hey, by the way, I have done this thing called the Incomensuration Theorem. And it says some things about ultimate epistemic limits". I would love it if you had some time to look at this. It is a result that goes back to 1997. Anders: Yeah, and I fear I might not have had enough time for that. I cannot remember that I did pursue it, unfortunately. Happy to come back to it though. Forrest: I am basically saying that one way or another, when we are working on a proof that certain things can or cannot be done, that we are ultimately talking about what is knowable and what is not knowable. We are basically saying what is doable and what is not doable. What is, and what is not. If we are wanting to be clear about the nature of what proof is, it is not just that it is a formalization -- It is a relevant formalization in the sense that it applies to: Have we made a statement that is within the space of the knowable? Have we moved from a state where we did not know, like with the Pythagorean Theorem, to a state where we do know the Pythagorean theorem? Are we sure that we have asked the right question and that we know the answer, and that we know that the basis upon which we know the answer is itself correct? To go back to the overview, I am setting up things in a fairly idiosyncratic way so that it is possible for me to basically make a move like that. Remmelt: I am going to try to kind of relate this back to the whole control theory thing. What we are talking about in terms of that notion, there is some kind of control system, some kind of alignment system, could be a version of control system. It is trying to align external effects over time with internal reference space. There is this question around, is there some kind of inequality where what this system could know, and not only know, but also kind implement in engineering sides, is in the span of what the system could do, and could know, about the world, and implement, is that sufficient to cover what needs to be known and implemented? So that is/has roughly how I am thinking about it. Yep, that is exactly correct. Anders: Yep, that is/has what I go too. So the problem is, of course, kind of closing the inequality by showing that you have all of these limits on how much maximal control can be achieved. Given these fundamental epistemic limits, or practical limits based on what can be computed, as well as these complexity theory limits, like Rice's Theorem, that will then overwhelm the form of control that is necessary to get the right kind of alignment. Forrest: When we are dealing with low levels of abstraction, the level to which these limits of control apply are relatively modest. As we go to higher levels of abstraction, these limits of control become increasingly relevant. And when we get to levels of abstraction relevant to the notions of say, artificial general intelligence safety, that the limits of control end up becoming much more significant -- they dominate the space. If I basically set a number line between zero, as meaning "I have no control at all", and one, which means "I have perfected control", (whatever that means, however you define perfected control), it is just a certain notion. If I am dealing with a low level of abstraction, anything to do with bits, anything to do with words, things for which error correction would normally be thought of as a way of thinking about it, the limits of control are mostly not relevant. And as I start to climb up into the levels of abstraction of "was it the right sentence?", I notice that now we are at the level of what ChatGPT can do. The GPT4 version of ChatGPT, was able to put together constellations of sentences that at least within a few paragraphs, made very good sense, maybe up to a page or two. And we have since been increasing the scope over which it can make sense. For example, we can we can start to think about, to what degree can we constrain the meaning of what ChatGPT is doing so as be and remain truthful -- correct to the actual state of the world? Can we come up with a good error correction methodology in the form of identifying at least an answer to the question:. "is ChatGPT hallucinating or not?". Now we are at a level of abstraction which is actually relevant to a real world concern today, for which it is not so obvious how to apply control theory. Anders: RLHF and other methods are being used right now to control ChatGPT. We know some of their limitations are probably... indeed, there are a few papers that have interesting plots of the kind of trade off when you are trying to make it more harmless or some other measure that you want, and how much you lose, for example, specificity of the answers, etc. That is right. Okay, so let us now add another scale of zero to One, where zero is as concrete as it possibly can be, and one is as abstract as it could possibly be. And when we are considering things like individual bits, we are pretty close to the concrete side, so we are talking some really low number. When we are considering a question like:. "does the message have the right sentence and the right paragraph, in this context?", ...we are at maybe point four (0.4) on this scale. However, when we are considering something like:. "Is general artificial intelligence safe?" ...that we are at at least point nine (0.9) on the abstraction scale. If we are starting to see significant limitations about what can be done at the 0.4 level, then I can have really strong reasonable doubts about what can be done at the 0.9 level. I am noticing that the higher up I go, the worst these problems get. Anders: Now the problems we have with ChatGPT might not prove much, of course, because you can always say:. "maybe in the future we somehow figure out a much better method". While the principle based thing we have been spending the most time on here is showing a hard limit. So the really interesting thing I want to figure out is, how the scaling here with level of abstraction, and how strong these hard limits become. In the best possible case we would have some kind of nice numeric value here. And we will have some little formula showing some curve in the plane. I have my doubts that we are going to get a really neat formula for that. But that I think it would be the kind of really nice rigorous result that we could really lean on. Forrest: This is part of the reason, yes. To some extent, we can do this -- we can approach this in two ways. We can we can do this in parts, by taking the control theory model, that we have developed here, and compose increasingly refined limits. Say you did your homework and you said, "yep, this is a complete theory of control; This model of control theory is good". Then we can start to ask:. Given the error bars in each box, how do those error bars propagate to a final result, relevant to the whole thing?. And we can also consider the feedback of the world itself, rather than just the feedback within the control system. But for now, let us now just parameterize with what we do have. Since we are eventually wanting a maximum degree to which control is possible, let us assume that we are going to just push to the limits of the maximum that we can get within the envelope of the limits of each of the individual boxes. In other words, for each component factor, we can come up with a version of that curve. We can show the contribution to that curve -- to the overall curve -- based on each component. We push down from the top -- subtracting from the maximum available control, rather than trying to assemble control from the bottom of the graph. Then we can then say, to the degree that we can parameterize any of these things, we will at least know something about the bounds of the whole thing. Anders: Yep. And that sounds like it could be a quite interesting project. Not a small project by any means, but it would actually be a quite valuable project. Forrest: In the same sort of way that we collaborated previously, I would be delighted to have you be interested in this project enough to notice that it is so worthwhile. Having myself done some version of this project already, and thus having looked at this at least reasonably closely for at least enough of these factors, that I have managed to convinced myself that when I look at the actual inequality, that the inequality fails to be satisfied for every level of abstraction past, say, 0.6. Anders: Now, I think having a good measure for level of abstraction that sounds like something I need to think of a fair bit more about how to do that well, because that numeric scale is good as an intuition pump. But abstractions are rather slippery things. Yeah, but I do like your view here. I think it very well could be that you could see that these curves intersect in a particular way. And that would be really valuable as a way of actually making this -- a fairly firm and rigorous argument. Forrest: Great! But then notice what we have just done. If I establish any constraint on the minimum level of control necessary, and I showed that the factors that define that minimum level of control necessary have dynamics similar to that hashing function algorithm -- the one I presented as an example earlier, then it is basically saying that there are things in the real universe which we cannot model better than just doing the thing itself. This is the case, unless we also decide to just accept that these error bars are going to be much larger than we would like them to be -- or, in the case if AGI safety -- dangerously larger than we can tolerate. Having error bars be larger than the minimum level we need them to be, is effectively to have explicitly declared something -- some product you are building, to be 'unsafe'. If the error bars are fundamentally too large, and cannot be constrained appropriately, then no declarations of 'containability' within reasonable bounds of expected behavior can be asserted. We have now taken this first project, combined it with the second project, and created this overall outcome of a proof of impossibility of establishing the safety of general artificial intelligence. Anders: Yeah, again, an obvious thing here to argue is:. would actually the safety problem be like hash functions? ...and I can kind of see that you can now make an argument about safety, if there is anything that Rice's theorem applies to, which seems to be a very reasonable thing. You could imagine a space of possible AI algorithms and ask:. Is this safe or not? And that is supposedly a non-trivial function. So when it is in general uncomputable, yeah, I think this sounds like where one could actually make something that is pretty firm. Remmelt: Here is an intuition or impression I have, is that there are certain limits that you have when you are trying to model these recursive systems in the real world. And if you can show that the kind of recursive... like the feedback loops that we should be worried about, that are unsafe, that these would converge over time, that these that these feedback loops would happen at a rate, and/or a scale, like a span across time, that would fall outside of the limits of what could be properly controlled by any of the systems internally, then we are showing that the inequality is relevant to safety, showing that what will be insufficient in terms of control. Forrest: You said it reasonably well. And again, this is just one case. The thing Remmelt is pointing to, legitimately, is only just one way to formalize the overall argument. As we explore it further, we will see that it turns out that there are several distinct methodologies -- some of which are functionally independent. If I have general artificial intelligence, then it inherently has the capacity to integrate any algorithm that it encounters in the world. So it is actually going to be representing within itself, somehow or other, all possible future algorithms that it deems relevant to its well being -- or relevant to its goal functions, or relevant to -- I do not even know what or how to predict. But either way, there is, for sure, not going to be a state where I am going to have, today, a clear confidence about what the future algorithms are going to be, that I am trying to constrain. Rice's theorem does actually apply, because Rice's theorem is basically asking:. If I do not know the algorithm, can I characterize it? We all know already that even for the single and very basic characterization of "Does it halt?", that I cannot ever fully know. The Rice theorem generalizes this to all manner of more complex situational characteristics. Say that I have received an alien message from some other some other civilizations which is somewhere very far away. Can I even verify whether that message that I have received is a virus something that is designed to take us over? How do we even look at the message without risking ourselves, subjecting ourselves to the mind virus it might be? Anders: Did I show you by the way, my paper about linguistic security on SETI? Forrest: I do not know that I have come across it. Anders: I might have told you about it way back. But yeah, this is a good example. Forrest: Well think about how it is relevant to AI, though. For any message that the AGI sends to us, do we know if it is intending to harm or help? We do not know what the meaning or intent of any foreign message is, until we read it. The AGI may as well be the alien -- there is no difference. And this applies to literally everything the AGI does. Every act it takes, everything it does in the world, could maybe be some sort of signal -- is some maybe sort of message for us -- for to maybe interpret or read or understand, which is then, maybe, actually an algorithm, one specifically meant by that super-intelligence, so as to operate on us, within us, one that we cannot stop once started. Something we would we be in consent to if we knew what it would eventually do. Consider it to be lie a virus that runs on your mind hardware to your own self-destruction. Whatever applies to any messages that are received from any possible future unknowable aliens -- creatures of whatever nature that also presumably do not have our best interests at heart -- why would we assume something like that, given that the very word "alien" is the actual word used? Clearly for future alien messages received from actual alien beings, we can be sure that we have not, and could not ever, post-retrospectively, ensured that those aliens were "somehow built" so as to be "aligned" with "our interests", or our wellbeing, our our safety. The point here is that anything that is inherently true about messages received from some far away alien, is also inherently true about any expressive action taken on the part of any at all future superintelligence(s). They are, as far as analysis of engineering is concerned, exactly functionally equivalent. What is true of one is also true of the other. Anders: Yeah. So it seems to me that we reached a kind of natural point here, where, okay, we have a left hand side of this inequality, we have not specified a nice, neat analytic formula or anything like that for it. But we basically agree on it, there are bounds on control. These bounds -- some of them can be rather nicely specified constraints based on the physics and mathematics, and some of them are going to be kind of tricky because there are probably a lot of uncomputable aspects to it. We also have this issue of parameterizing the level of abstraction, but if one could divert nicely and then show that yes, and the bounds we want for safety, you get some kind of crossing, then we might say, beyond this level of abstraction we cannot get safety. There is of course -- and maybe this is again for later, but I think it is an interesting issue -- and that is, of course, looking at:. Can we do something approximately, that is not perfect security, but good enough? That complicates things by moving into a realm of average performance given some probability distribution. It adds a lot of degrees of freedom but makes the conversation much more complicated. Forrest: Well that is just it. We were not ever striving for 'perfect security'. That would be an equivocation. We are here just trying to get any notion of 'secure' at all, that might even work, ever. This is not just about 'the average case', but is more to ask:. Is there any case at all, ever, in which the notion of 'safety' has any reasonably grounded probability that anyone would find acceptable? If there is just a complete total absence of any grounded notion of safety at all -- if no claim of safety has been supported, then that is just not an acceptable case -- not allowable in reasonable product design. Notice that whatever method anyone is using to make the claim "that it is good enough", is still within the space of the inequality. It is just another way of representing that inequality. Anders: I think you actually move the curves around a fair bit here. So you could imagine the crappiest definition of 'good enough', which is that you actually do not care about security at all. In some sense, it is a valid choice. It is a stupid choice. But with that one, of course, we always get "security" because we have made security so stupid. Now, the interesting thing that is being considered is what if we have a weak sense of security? Forrest: Well, ironically, if think about it, notice that we did that already -- that was exactly part of the reason that we earlier did not mind having a weak ambiguous definition before. We actually went for the weakest possible definition of safety we could get away with, on purpose, specifically for this reason. Most real definitions of that term would be stronger, or more specific, or more refined. This is why it was a weak definition. By limiting it only to the case of something which is very nearly as stupid as "it does not kill us all", we made the bar of 'minimum level of control required' on the right side of the inequality as low as we could possibly make it. Now, if we that that weakest possible notion of safety is not even strong enough for practical concerns, then, if anything, the minimum level of control required will go up, will actually be higher, which will have the effect of making the inequality stronger. It does our proof work for us. Let us basically say safety and alignment means 'peace on earth and goodwill towards men'. We can start to ask questions like: It is healthy for the individuals and the ecosystem? It is healthy for humanity and nature? And we do have to include nature, because we will for sure need it in order to have healthy humans. This is basically just accounting for anything that is not considered yet -- anything that is beyond the scope of what I am willing or able to consider today, which might happen to matter in the future. And we also notice that cannot make the definition of 'safety' and 'alignment' that we are considering any actually at all weaker, because if we do not get at least that, then we might as well admit that we have completely given up on security, on safety -- which is basically to already admit that the AGI is inherently unsafe. The case that we are arguing against is any possibility of anyone at all ever in the future being able to make the claim 'that AGI is Safe', for any rational meaning of 'claim' at all. Anders: But now you have a kind of deterministic take on it. So imagine that we said, 50% chance that it works, or 75% chance or 99% chance, that is a parameterization that you can actually do. You can maybe do that sort of parameterization, but only of you do not also include or consider certain types of structural recursion -- ones which do actually inherently apply in the AGI case, (and with things analogous to that). As soon as you allow that specific type of recursion, which is also, unfortunately basic to the nature of algorithms, and also, observably, for some aspects of the nature of the physical universe, that we do actually know that there are some things that are going to have that kind of recursive nature. In this case, then, so as to actually be accounting for that particular kind of recursive structure aspect that we notice that it becomes necessary to -- rather than to be talking about definite states -- that the very best we can do is to be talking about 'convergence forces'. As such, the inequality that we are interested in is not specified in terms of states, but it is actually specified in terms of convergence forces. Ie, If the convergence force that is minimally necessary to persist, over time, the sense of the actual meaning of safety, practically, is strictly greater than the maximum level of the convergence forces that can be generated via any method of causation at all, then there is going to be a clear and present danger. I set up the specific characterization of control theory so that it is possible to show how to understand this argument in terms of control theory, but control theory is just a special case of causation. And the argument itself is based on an even deeper truth. If I were to show instead, via a much simpler, and far more hyper-abstract model of thinking -- one that is inherent in the nature of causation itself, for reasons that have to do with fundamental epistemics, and therefore is also categorically ontological, and therefore is also categorically axiological -- that we will find that we have very strong reasons to know that control will not be possible for AGI. As a metaphysicist, these are the tools I work with. What turns out to be a limit inherent in causation, as also an inherent and deep limit in the nature of symmetry itself, that these can be also used to point out -- to establish for sure -- that the convergence forces maximally available are actually insufficient to the smallest need. That is the fast path. And this turns out to be a far more powerful answer, because I do not need to depend on any graphs. I do not need to depend upon any heuristics. I can show that it is ontologically the case in the epistemic itself. Remmelt: Anders, do you want to talk for a while? Anders: Well, yeah, I am not totally buying this, but it also feels like I am... My problem right now is that it is getting relatively late in my time. So, and I actually need to get up early. On the other hand, we are getting to the interesting point here. So maybe one should think about actually continuing this with a sequel, because it feels like we reached a kind of good point where I think I can now say that I understand this argument. I think we can then start analyzing where can we poke at it? I do think there is an interesting issue here about that actually having a probabilistic outcome does lead to a one set of interesting arguments again, but I am also feeling like I would rather do this when I am totally sharp in my head. Forrest: Okay. In the interim, definitely distinguish between 'states' versus thinking about those forces which are converging on those states. Anders: Forces are interesting also, because that means that you can get a certain degree of fuzziness into systems, both the control system -- we want to have a robust control system -- and the system that is being controlled. We do not necessarily want everything to be totally specified, because in that case, we are not going to get anywhere. I think in general this model is really nice. And I think this one, I am kind of itching -- Yeah, I want to try to see if one can write it up in a proper manner. Because I do think that trying to condense it down into a relatively tight, rigorous model is possible and might actually be quite valuable. Forrest: Ok. Given this, I also suggest that you add into your background question set a considering of the degree to which that fuzziness is associated with forces. Since you were mentioning forces, notice that the forces parameterization basically gives us an allowance for a certain level of fuzziness. But as we go up the abstractions scale, the interaction with that fuzziness changes. At first, it is in your advantage. But then it becomes part of your disadvantage. However, by the time you get to 0.7 or 0.8, that all of a sudden that fuzziness is now working very much against you, rather than being in favor of good control. The fuzziness is awesome when we are between say 0.3 and 0.6 -- in that range, it is a real help. But once you get to 0.7 and 0.8 -- things are not so good anymore. And by the way, be sure to keep in mind that we are for sure not satisfied, with regards to a question of 'safety' in the sense of 'works for the future life of planet' unless it really really works well at 0.9 scale. Have fun! Anders: Oh, yeah. Yep. So should we aim at setting up a continuation? Forrest: Well, at this point, because it was Linda that was ultimately asking for this meeting, I want to return to her and basically ask:. Where does all this land for you? Anders: Yes. She is the boss. Linda: Okay, I am not sure I should be the boss. But okay, this was extremely interesting. I am very happy about this conversation. You got much further than you were. Because I really do not have the prerequisites, I wanted to learn from it. My next step is:. what control theory papers should I read? Because I want to get up to speed and be part of the conversation. There is at least a month until I have time and then I need to actually read it. Whether or not any of you should continue the conversation -- I would be super happy to hear that there was a continuation of this conversation. I think you both know so much more than me at this point. Even the tangents that you skipped over, like the the whole thing, I am like, that sounds fascinating. It might not be relevant for AI safety, but I want to hear about it. And yeah, I think I would encourage both me and Anders to think about what we talked about, and think about the homeworks. Anders if you have time -- I do not know how much time you have -- and then come back with: What are the remaining cruxes? Forrest: One of the things that I can respond with is that, if you look at the literature of control theory, you will find that the way I have described it today is a bit idiosyncratic. However, that is ok. Anders, who is knowing a lot more about the topic, has at least at this point so far, not rejected the way I have been describing the topic, not at all. Therefore, you can at least have confidence that this description that I have been giving does actually make sense -- at least to Anders, and certainly to Remmelt. Because of this exposure of my work has not resulted in a response, by them, of saying something like "oh, that idea is ridiculous!" -- that they have not made immediate claims that my way of thinking about the topic of control theory is crazy or easily dismissable, then you can maybe have at least some confidence, based upon Anders' mind, that these questions I am raising about AGI non-containability are actually worth looking at. Linda: Yeah. It also serves as an orientation. Yes, like, that is absolutely true. And the overview given are orienting factors... Anders: I think while this is an idiosyncratic take on control theory. I think many people in control theory would say, yep, it still corresponds to what we are doing. It is just that most would say, yeah, but I am mostly building actual systems rather than doing theorizing about it. And that is fine. Many of the founders of control theory were doing very much theorizing, but then it became a much more applied field. I actually enjoyed watching some videos from a control theory engineering course at MIT. I was especially looking for internal modeling principle, which is kind of Ashby's law of requisite variants, which is kind of really cool, really ancient, before information theory. And then it pops up right there on the whiteboard. And the lecturer does a little gesture towards it [inaudible]. Forrest: Yeah, I agree. Like the part of the reason why I am not just going to give you a bunch of links to control theory papers is because most such works referenced are going to be focused a lot more on the practical, rather than the theoretical, and will focus a lot on inherently low level topics like signal processing, error correction, and entropy, for example. For a more beautiful understanding, consider Information theory and complexity theory. I can point you to Shannon's work. But again, it is the philosophy that is more important. Linda: Years ago, I looked into control theory, because it seemed relevant for AI Safety, and the things I happened to find were for very simple systems. I did not then see the connection, and so I moved on. I do appreciate some guidance. so maybe I should post the lectures that Anders mentioned. I appreciate some guidance to find the things that actually... maybe I should look at old control theory, when it was very, when it was still theoretical. Forrest: You might look at cybernetics, that tends to be a more philosophical take. And it talks on a more abstracted level. Things related to signal processing are also relevant, but I would prefer to suggest you look at the work of Shannon, and try to understand kind of what he was talking about, but then recognize that everything in both Shannon's description and in cybernetics is going to be dealing with extremely simple systems, as you have noted already. That is almost on the complete opposite end of the kinds of things that are relevant for our work here. People have confidence in control theory because it has done so much for us. It is natural for people to then believe that, for reasonable reasons, that, hey, this hammer has worked for me for all the nails we have ever encountered so far. And then I walk along, and I say:. "I am sorry, but that is not a nail, that is a screw. And you need a different tool for that. You can try hammering that thing into the wood if you want to. But you know what, it just is not gonna work very well". And we went through a whole bunch of debates. Remmelt is smiling because he remembers. We wrote a whole essay on this, how people approach us expecting our work to be a tool like a nail, to resemble a hammer, we were like "no, dude, it is a screw!". Well, then, you are asking me "what is the screw?". There is not much I can easily point to, at least that I know of. It is funny, actually. It is hilarious. Remmelt: The frustrating thing here is that we cannot... I cannot even say that that is the case, because that already sounds like, "hey, you should trust me. that the thing I am about to say is not the thing that you think it is, you do not know what I think". Which all sounds like a really arrogant statement. You want to say something like that with humility. This is just stuff that we have not just worked out. But how do you convey that? That is very hard. Forrest: Or who else has done so even? I am not even sure that there are very many places I can send her. Remmelt: Roman Yampolskiy has dug into limits of controllability. So we can sort of reference a bunch of his stuff too. Forrest: Yes, he has looked at the Rice's Theorem as applied to AGI. And his work in that space is great. The main difficulty that Anders might find is that Roman presentation style maybe will leaves someone thinking that there are a lot of things that they could quibble with, even though it is not actually so. The argument is strong, even if the presentation does not always make that especially clear in the general case. As a result, it is not as tight as it would need to be. Anders' standard of tightness is probably higher than is obviously apparent Roman's work. Anders: Well, yeah, and of course demanding high tightness also means that it takes absolutely forever to finish things up, etc. So, yeah, I need to head off now, because I need to really fix a bunch of things. And I also managed to pull out the plug accidentally from a lot of electric equipment. Fortunately not for the router, it was lucky. But yeah. But I would love to kind of do the sequel here. Maybe I could try to see if I can write up a slightly tight take on what I understood here and see what happens. Work with Remmelt. The two of you are kind of on the same page now. Remmelt: I think in that case, it would also be good to send it to you. Forrest: I would love to see it. And you can help each other. Anders: I think it is also very relevant because this is a bit linked to some questions that do show up in my book project, too, which is not, per se, about this, but control theory does show up, and limits to control really matters to me. But I really need to say good night. Yeah, thank you all for this, this has been a marvelous conversation. Happy Holidays, and Happy New Year, unless we see each other before that. Linda: I have a call in 10 minutes. So I also mean to go, I will re-watch everything when I am less tired and jet lagged, I am looking forward to that, then I also have like time to follow some of the references you gave. And then I might show up in the chat in signal and just ask questions. Forrest: There is a ton of literature on [mflb.com at doc id 2211]. And there are some essays in there that talk about control theory and limited control theory. But mostly I have avoided trying to do that in written prose, because, frankly, it is a nitpick hell for me. I am willing to do something like this with Anders, in person, because Anders actually understands the big picture of the world. But for most people, they are specialists. They will just not get the big picture. And as a result, I just feel myself getting frustrated with them. Linda: I do not think I am too many inferential steps away. I have thought about alignment and the philosophy related to alignment, and I have physics training. Forrest: The mere fact you are coming into this from a kind of generalist perspective is going to make this a lot easier. Because I was basically saying that most of the time when I deploy arguments that are specifically control theory, it ends up going and being read only by people who are specialists, and specialists have a lot of embedded opinions and assumptions that are just never questioned -- and that is problematic for this work. It ends up becoming that I have to challenge those assumptions. And they do not like it because they are saying:. "well, I do this every day. Who are you in this? You do not know anything about this!". Linda: So maybe I hear your perspective first, And then I go to control theory second? Maybe I can just talk to you and Anders and Remmelt and then get up to speed. We will see. Forrest: Whatever you read, it will be easier to read because of this conversation. Linda: Yes. Thank you very much. I also really had to go. Forrest: Thank you for setting us up. Appreciate meeting you. [Transcript End]