Cramer’s rule, explained geometrically | Essence of linear algebra, chapter 12

In a previous video, I’ve talked about linear
systems of equations, and I sort of brushed aside the discussion of actually computing
solutions to these systems. And while it’s true that number-crunching
is something we typically leave to the computers, digging into some of these computational methods
is a good litmus test for whether or not you actually understand what’s going on, since
this is really where the rubber meets the road. Here I want to describe the geometry behind
a certain method for computing solutions to these systems, known as Cramer’s rule. The relevant background needed here is an
understanding of determinants, dot products, and of linear systems of equations, so be
sure to watch the relevant videos on those topics if you’re unfamiliar or rusty. But first! I should say up front that Cramer’s rule
is not the best way for computing solutions to linear systems of equations. Gaussian elimination, for example, will always
be faster. So why learn it? Think of this as a sort of cultural excursion;
it’s a helpful exercise in deepening your knowledge of the theory of these systems. Wrapping your mind around this concept will
help consolidate ideas from linear algebra, like the determinant and linear systems, by
seeing how they relate to each other. Also, from a purely artistic standpoint, the
ultimate result is just really pretty to think about, much more so that Gaussian elimination. Alright, so the setup here will be some linear
system of equations, say with two unknowns, x and y, and two equations. In principle, everything we’re talking about
will work systems with a larger number of unknowns, and the same number of equations. But for simplicity, a smaller example is nicer
to hold in our heads. So as I talked about in a previous video,
you can think of this setup geometrically as a certain known matrix transforming an
unknown vector, [x; y], where you know what the output is going to be, in this case [-4;
-2]. Remember, the columns of this matrix tell
you how the matrix acts as a transform, each one telling you where the basis vectors of
the input space land. So this is a sort of puzzle, what input [x;
y], is going to give you this output [-4; -2]? Remember, the
type of answer you get here can depend on whether or not the transformation squishes
all of space into a lower dimension. That is if it has zero determinant. In that case, either none of the inputs land
on our given output or there are a whole bunch of inputs landing on that output. But for this video we’ll limit our view
to the case of a non-zero determinant, meaning the output of this transformation still spans
the full n-dimensional space it started in; every input lands on one and only one output
and every output has one and only one input. One way to think about our puzzle is that
we know the given output vector is some linear combination of the columns of the matrix;
x*(the vector where i-hat lands) + y*(the vector where j-hat lands), but we wish to
compute what exactly x and y are. As a first pass, let me show an idea that
is wrong, but in the right direction. The x-coordinate of this mystery input vector
is what you get by taking its dot product with the first basis vector, [1; 0]. Likewise, the y-coordinate is what you get
by dotting it with the second basis vector, [0; 1]. So maybe you hope that after the transformation,
the dot products with the transformed version of the mystery vector with the transformed
versions of the basis vectors will also be these coordinates x and y. That’d be fantastic because we know the
transformed versions of each of these vectors. There’s just one problem with this: it’s
not at all true! For most linear transformations, the dot product
before and after the transformation will be very different. For example, you could have two vectors generally
pointing in the same direction, with a positive dot product, which get pulled away from each
other during the transformation, in such a way that they then have a negative dot product. Likewise, if things start off perpendicular,
with dot product zero, like the two basis vectors, there’s no guarantee that they
will stay perpendicular after the transformation, preserving that zero dot product. In the example we were looking at, dot products
certainly aren’t preserved. They tend to get bigger since most vectors
are getting stretched. In fact, transformations which do preserve
dot products are special enough to have their own name: Orthonormal transformations. These are the ones which leave all the basis
vectors perpendicular to each other with unit lengths. You often think of these as rotation matrices. The correspond to rigid motion, with no stretching,
squishing or morphing. Solving a linear system with an orthonormal
matrix is very easy: Since dot products are preserved, taking the dot product between
the output vector and all the columns of your matrix will be the same as taking the dot
products between the input vector and all the basis vectors, which is the same as finding
the coordinates of the input vector. So, in that very special case, x would be
the dot product of the first column with the output vector, and y would be the dot product
of the second column with the output vector. Now, even though this idea breaks down for
most linear systems, it points us in the direction of something to look for: Is there an alternate
geometric understanding for the coordinates of our input vector which remains unchanged
after the transformation? If your mind has been mulling over determinants,
you might think of this clever idea: Take the parallelogram defined by the first basis
vector, i-hat, and the mystery input vector [x; y]. The area of this parallelogram is its base,
1, times the height perpendicular to that base, which is the y-coordinate of our input
vector. So, the area of this parallelogram is sort
of a screwy roundabout way to describe the vector’s y-coordinate; it’s a wacky way
to talk about coordinates, but run with me. Actually, to be more accurate, you should
think of the signed area of this parallelogram, in the sense described by the determinant
video. That way, a vector with negative y-coordinate
would correspond to a negative area for this parallelogram. Symmetrically, if you
look at the parallelogram spanned by the vector and the second basis vector, j-hat, its area
will be the x-coordinate of the vector. Again, it’s a strange way to represent the
x-coordinate, but you’ll see what it buys us in a moment. Here’s what this would look like in three-dimensions:
Ordinarily the way you might think of one of a vector’s coordinate, say its z-coordinate,
would be to take its dot product with the third standard basis vector, k-hat. But instead, consider the parallelepiped it
creates with the other two basis vectors, i-hat and j-hat. If you think of the square with area 1 spanned
by i-hat and j-hat as the base of this guy, its volume is the same its height, which is
the third coordinate of our vector. Likewise, the wacky way to think about any
other coordinate of this vector is to form the parallelepiped between this vector an
all the basis vectors other than the one you’re looking for, and get its volume. Or, rather, we should talk about the signed
volume of these parallelepipeds, in the sense described in the determinant video, where
the order in which you list the three vectors matters and you’re using the right-hand
rule. That way negative coordinates still make sense. Okay, so why think of coordinates as areas
and volumes like this? As you apply some matrix transformation, the
areas of the parallelograms don’t stay the same, they may get scaled up or down. But(!), and this is a key idea of determinants,
all these areas get scaled by the same amount. Namely, the determinant of our transformation
matrix. For example, if you look the parallelogram
spanned by the vector where your first basis vector lands, which is the first column of
the matrix, and the transformed version of [x; y], what is its area? Well, this is the transformed version of that
parallelogram we were looking at earlier, whose area was the y-coordinate of the mystery
input vector. So its area will be the determinant of the
transformation multiplied by that value. So, the y-coordinate of our mystery input
vector is the area of this parallelogram, spanned by the first column of the matrix
and the output vector, divided by the determinant of the full transformation. And how do you get this area? Well, we know the coordinates for where the
mystery input vector lands, that’s the whole point of a linear system of equations. So, create a matrix whose first column is
the same as that of our matrix, and whose second column is the output vector, and take
its determinant. So look at that; just using data from the
output of the transformation, namely the columns of the matrix and the coordinates of our output
vector, we can recover the y-coordinate of our mystery input vector. Likewise, the same idea can get you the x-coordinate. Look at that parallelogram we defined early
which encodes the x-coordinate of the mystery input vector, spanned by the input vector
and j-hat. The transformed version of this guy is spanned
by the output vector and the second column of the matrix, and its area will have been
multiplied by the determinant of the matrix. So the x-coordinate of our mystery input vector
is this area divided by the determinant of the transformation. Symmetric to what we did before, you can compute
the area of that output parallelogram by creating a new matrix whose first column is the output
vector, and whose second column is the same as the original matrix. So again, just using data from the output
space, the numbers we see in our original linear system, we can recover the x-coordinate
of our mystery input vector. This formula for finding the solutions to
a linear system of equations is known as Cramer’s rule. Here, just to sanity check ourselves, plug
in the numbers here. The determinant of that top altered matrix
is 4+2, which is 6, and the bottom determinant is 2, so the x-coordinate should be 3. And indeed, looking back at that input vector
we started with, it’s x-coordinate is 3. Likewise, Cramer’s rule suggests the y-coordinate
should be 4/2, or 2, and that is indeed the y-coordinate of the input vector we started
with here. The case with three dimensions is similar,
and I highly recommend you pause to think it through yourself. Here, I’ll give you a little momentum. We have this known transformation, given by
a 3×3 matrix, and a known output vector, given by the right side of our linear system, and
we want to know what input vector lands on this output vector. If you think of, say, the z-coordinate of
the input vector as the volume of this parallelepiped spanned by i-hat, j-hat, and the mystery input
vector, what happens to the volume of this parallelepiped after the transformation? How can you compute that new volume? Really, pause and take a moment to think through
the details of generalizing this to higher dimensions; finding an expression for each
coordinate of the solution to larger linear systems. Thinking through more general cases and convincing
yourself that it works is where all the learning will happen, much more so than listening to
some dude on YouTube walk through the reasoning again.

100 thoughts on “Cramer’s rule, explained geometrically | Essence of linear algebra, chapter 12”

  1. Great videos. shocking I didn't know about this channel all these years. Wondering what software you use for these animations

  2. This channel truly is a bless. I remember I watch this series when they were posted, just before entering engineering school, and it really gave me interest in math, and in particular the intuition you give is great. Thank you

  3. I'm very much looking forward to an "Essence of Differential Equations" series of videos if that's what you're planning on.

  4. The best explanation for it! I have never seen such kind of explanation but the old "a matrix is a function of a determinant"…

  5. I'm sure I was using Cramer's rule at least once because in the situation it was easier than Gaussian elimination. I think it had to do with calculating determinants being trivial with the right distribution of zeros in the transformation matrix.

  6. The math here is brilliant, as always.
    But the rendering of your animations could be improved if you enable the antialiasing in 3D.
    It's kind of disturbing when objects move.

  7. How you make this vedio
    Please give some idea
    Which software. Do you use
    Tell sir .I want to make the graph of
    Function which are very complex .
    Like lg(x^2+x+5 )
    Please sir request you

  8. Uhm, why don't you simply find the inverse of the matrix and multiply your output vector to get the input vector?? That's the same thing right?

  9. Never understood why people would use Cramer's Rule. Gaussian Reduction is faster and easier. It's amazing how Cramer's rule works but… why use it in the real world on pen and paper?

    I even had a stupid electrical exam where I was asked specifically to use Cramer's rule to solve a system of equations obtained using Kirchoff's laws. I had to derive it because I was stupid and forgot how it works… oh well.

  10. Hey at 3BLUE 1Brown can you make a video on Ostrowski–Hadamard_gap_theorem. I am a noob and need the Help from the great 3BLUE 1Brown. THANKS BROTHER KEEP UP THE GOOD WORK!!!!

  11. That is such a nice presentation.
    But can i one explsin the limitstion yo this rule for transformation involving diffrrent dimensions

  12. Okay I was trying the thing finding the inverse of a matrix. But I feel like I just put it together algebraically, and not really intuitively. So maybe I’ll go back and do it just for 2×2.
    To find an inverse, you want to find vectors x_1,…,x_n which map to standard basis vectors e_1,…,e_n. So let’s do the thing for each vector x_j. And the i-th element in vector x_j is x_ij, because that’s how notation works in a matrix. So, the parrellelopiped spanned by e_1,…,e_{i-1},x_j,e_{i+1},…,e_n (which has volume x_ij) maps to the paraleopiped spanned by all the vectors of the first matrix A with a_i replaced by e_j. Since the volume of that is x_ij * |A|, we can find x_ij by finding the determinant of A with the i-th vector replaced by e_j.
    And what’s neat-o about determinants and co-factor expansion is that you can do it across or down any row or column you like, which means let’s do cofactor expansion down the column that’s almost all zeros. So, because of this, the determinant of the thing is the same as (plus or minus) the determinant of the (j,i) minor of A (note that its ji not ij).
    So, the ij-th entry in the matrix inverse of A is (+/-) 1/|A| * |M_ji|. Which is what Wikipedia says what the Cramer rule for matrix inverse is.
    (I’m not keeping track of pluses or minuses because I. Sure I’d make a mistake there somewhere, and I think you can get the gist without worrying about it.)

  13. I feel so much overwhelmed at the begining that I feel that I don't know the basics required and thus don't watch the whole video. How do I overcome this and truly understand the video?

  14. This is a neat way to solve this! Thinking about it this way got me thinking about how determinants are scalar equivalents of Grassmann's exterior product, and how the determinant of a non-orthogonal transformation-matrix is the transformed area of the transformed original, which makes me wonder how viable it would be to represent such a transformation using the language of geometric algebra (including the exterior product) instead of the language of matrix algebra. (The possible representations of non-orthogonal transformations particularly interest me, here, since orthogonal transformations can always be represented by conjugation operations in geometric algebra.)

  15. Hello! Love your videos❤️
    I would like to see one about Laplace transform. Just an idea. Thanks :3

  16. A very nice geometric understanding of Cramer's rule, that I didn't see at all until now. It was just algebra for me. Thanks.

    1:34 But Gaussian elimination is also pretty geometrically! You change the basis of the target space to the standard basis so that finding the solution is easy, but at the same time since you're doing row operations you don't change the row and null spaces so you're left with the same solution to the re-posed problem. I think that's rather neat.

  17. You make my day everytime you post a new video !! Never seen such a talent for explaning and creativity in maths.

  18. I'm not sure if you are into Mathematical Logic but I 'd really love to see a video from you on Gödel's Incompleteness theorems. Your channel is amazing, thank you and keep up the good work!

  19. everyone my age: fortnitefortnitefortnite

    Me: Um i don‘t like fortnite. I like Maths and (Quantum)physics

  20. En el caso hipotético de que el universo sea infinito y haya infinitos multiversos, ¿podría existir una Tierra o un planeta plano y hueco?

  21. 0:46 wow youtube, you know what would be great? If you had an option for people to click on their screen in any of these three places for the appropriate video they want to go back and watch. We could call it annotations…

  22. I watched the whole series again because this video came out, and it just so happens I’m also concurrently taking a rigorous linear algebra course. It’s thrilling to me how in depth this series goes (and how little of that depth I picked up when I watched this 2 years ago) and seeing these topics I understand in a very different perspective. I’m very excited for the differential equation series to come, since I’m taking that in the fall!

  23. I'm preparing for engineering entrance exam.. IIT( Indian institute of technology)..
    But sorry to say that.. i have a little bit doubt in this video…

  24. Before this series, I thought Cramer’s Rule is like a black magic where you just change the column to find the solution, now it all makes perfect sense!

  25. This video is sooo good!! We just briefly rushed over Cramer’s rule in one day in my precalc class, with no actual understanding at all. This makes it so much more clear and satisfying! Keep up the amazing content :)))

  26. the 3-dimensional version is similar to the 2-dimensional version and it should work like this: in order to know the coordinates of the mysterious input vector, we should do division. Generally, the determinant of the matrix which transforms the input vector serves as the denominator. The numerator of the division for the x coordinate is the determinant of the matrix which keeps the last two columns unchanged and takes the output vector as the first column. In the same vein, the numerator of the division for the y coordinate is the determinant of the matrix which keeps the first and the last column unchanged and takes the output vector as the second column. For z coordinate, the matrix takes the output vector as the last column and keeps the other two unchanged. The key idea is that: (1) new volume =det(original matrix)*old volume. (2) new volume = det(new matrix which keeps two columns from the old matrix and takes output vector as one of the columns)

  27. I have an exam in linear algebra in a couple hours and this video just saved me some sleep time for how detailed and beautiful it is. guess i'll sleep for an hour and study till the morning rather than shoot myself in the leg 😀

  28. I'm actually very impressed he put in a solvable 7 variable matrix. I put it through a program I made a while back (using Gaussian elimination, sorry Cramer) and it spat out x0=-1.30893, x1=-1.22761, x2=0.205809, x3=-0.0762283, x4=-0.479706, x5=-0.449354, and x6=1.28772. I wouldn't have had the patience to make one with that many variables, especially with integer coefficients.

  29. hi i have a question I understand why it works on 2,3 dimension but why should it work on n dimension ?
    the determinant may not have any meaning on that dimension ?
    thank you for your time
    it was very interesting too know the reason behind Cramer's rule

  30. This video series is truly core of linear algebra I think.I really thank 3blue1brown team to make me higher level!

  31. I came back to this video after just looking this up on Wikipedia. It falls under the "geometric interpretation", sure, but I'm sorta more curious about the other proof listed, and how that relates to this.

  32. yet another new way to think about determinants. they have so many uses but its so bothersome to compute them especially in your head XD

  33. Brilliant exposé as usual. I struggled around 9:27 with the reasoning leading to the numerator Area to be understood as a newly constructed determinant. It took me too long to grok that any parallelogram shaped area corresponds to a stretching of the i- plus j- hat square by an amount defined by the determinant of a square matrix whose column vectors define the parallelogram
    . So just as y is unknown so also is Area unknown. But y is equal to Area/det A. Area is the determinant of a new matrix constructed as the known transformed i-hat column vector (first column of A) with the known transformed {x,y} which is the RHS of equation ie. the known coordinates of where unknown {x,y} ends up. Very obvious: after my struggles. These videos are priceless because they offer beauty also and even to those with my very modest math skills.

  34. Marvellous. Can you do a video on PCA algorithm? Many engineers use this without fully understanding the mathematics.

  35. In front of you tree you want to reach it and moved in descending order, ie, you cut in the first half, half the distance, the second half, half the half, one quarter of the distance, and the third the 1/8 of the distance.

    when will you arrive?

  36. Still waiting to hear the Russian accent version of parallelepiped 😉 Thank u for updating. Still a bit not clear but will likely have to watch few times.

  37. Hi,
    Fantastic Video, Helped a lot to understand very basics.
    I have one query. As we know, 2×2 can be representation of 2D vector, 3×3 is 3D vector, like wise do we have any geometrical meaning for 4×4 matrix and higher order matrix. Can you tell me, what is the physical significance of 4×4 matrix and higher order square matrix.
    thank you so much in advance.

  38. My two cents is that unlike Gauss-Jordan elimination algorithm, Cramer's rule is a rational function, so you can answer questions with variable parameters by solving algebraic equations. (Akin to computing eigenvalues using determinant.)

  39. Maybe i misunderstood, but can's you just compute the product of your output vector and the inverse of the original matrix to get your original vector!

  40. how did u do that to the seemingly most boring rule I've ever learned?
    I hated it soooo much. Hated linear alg because it made no intuitive sense to me whatsoever
    but now, god, it's beautiful.

  41. @ 3:00 -ish when you show graphically the det(A)=0 solutions was profound.
    Seeing the many solutions coalescing onto a single point just nails home the eigen value / eigen vector relationship, IMO.

  42. If determinant of 3 by 3 matrice is zero, that is it squished into lower dimension but how do we confirm that it is squished into plane,line or dot.Does rank of a matrix tell what squishing is done.

  43. Enlightening. Just purely enlightening! I think the key to understanding here, as pointed out in the video, is that under linear transformation all areas (or volumes in 3d case) change in the same way, so that the RATIO of change is the same. Cramer's rule is really all about this change. Rearranging the equations to reflect this ratio of change really helped me digest this one.

    I've never taken any linear algebra class before, but this brilliant series makes me really want to learn much more about the subject. To enlight, not to daunt, students, is the only golden standard of teaching. Can't imagine how much happier and more satisfactory students could have been if they were taught this way in school. Oh man, this even makes me want to become a teacher like him.

    Keep up the enlightning process, please!

Leave a Reply

Your email address will not be published. Required fields are marked *