The Podcast Quantitude Greg Hancock & Patrick Curran Season 4

- 1 -

The Podcast Quantitude

Greg Hancock & Patrick Curran

Season 4, Episode 4:

Partial Least Squares: Straight Outta Uppsala

Published Tuesday October 4, 2022 • 55:59

SUMMARY KEYWORDS

model, constructs, sem, structural equation modeling, latent

variable, people, fika, composite, structural equation, data, pls,

indicator, endogenous, factor, principal components analysis,

sheepskin, world, quantity, variables, system

Greg

00:04

Hi everybody. My name is Greg Hancock and along with my formative Mode B friend Patrick Curran,

we make a quantity food. We're a podcast dedicated to all things quantitative, ranging from the

irrelevant to the completely irrelevant. In today's episode, we talk about partial least squares, a

technique that resembles structural equation modeling, but with a lot of flexibility, including but not

limited to its ability to accommodate both reflective and formative constructs. Along the way, we also

mentioned dark red bubble, sheepskin, snorting, SEM, the Coors Light beer bong hotel rings ghost,

Larry, the Cable Guy, the Marvel Metaverse, the evil eye, bad luck, schlep rock fika, Abba, and pieces

left on the table. We hope you enjoyed today's episode. Have you happened to check RedBubble and

see all the cool stuff that we have? I have not, there's so much cool stuff that we have. I just thought I

should take this opportunity to let you know, obviously, you know, we have stickers and notebooks and

all of that. I would like to thank the people out there, not just for posting your really cool pictures of you

with the merch. But as we mentioned in the tag, all of the proceeds go to help donorschoose.org. And

what that means is that the stuff that folks out there have bought have helped dozens and dozens of

classrooms through things like calculators for math classes, hands on materials for teaching statistics, a

whole variety of things. And that's all due to the folks who are listening out there. So thank you very

much to everybody.

Patrick

01:36

I may have misunderstood something. I thought we were doing this whole podcast to become rich. I've

been waiting for three years for the check from you. And I've just been trying to be patient. I don't want

to be a wiener about it. Did I misunderstand how this works?

Greg

01:54

I think we can solve this if you just Google academia. Sorry.

- 2 -

Patrick 02:02

But thank you everybody. And it is so fun. Then my kid drew a couple of things that are up there in the

corner, toyed and whatnot.

Greg

02:09

Now, there's some things up there that your daughter didn't draw, and that my daughter didn't draw and

that I didn't draw and that none of us Drew.

Patrick

02:18

I'm sorry. Is there like a dark red bubble that we're part of that I don't know about yet? Here's

Greg

02:23

bootleg merch Can you not? There's more fake quantity and stuff than there is real quantity and stuff on

there.

Patrick

02:33

This is a real question. Why I yeah, I mean, there's no money involved. We have a market share of

negative $1,000 a year. I mean, it's a real question. Why on earth like bots, are they just trying to copy

stuff? I don't

Greg

02:48

know the answer. I was wondering if it was bots also. But there's a bunch of stuff that says quantity

food on it and different fonts and then there's a ton of stuff that says Jiffy on it which Jiffy by the way is

extremely

Patrick

03:00

happy about. Hi guys. Hi, Jiffy

Greg

03:03

jiffy. Is a your side hustle. No, I just kicked your very tastefully done. I bought five. All right, little buddy.

Thanks. Bye, guys. Have a great episode. I will make clear to folks out there that you should be able to

get whatever you like. But it's only the authentic quantity pod merch that actually makes it back to

donorschoose.org. And how do you know if it's authentic? It has the name of who posted it beneath it.

And it says by quantity pod or something? Anyway, so I thought that was kind of interesting that we

have fake stuff.

Patrick

03:39

I am fascinated by that. I don't mean to like belabor this, but I totally get making a fake account and

selling things with like a Nike swoosh or like a Ferrari logo. We're nothing. We own nothing. I am

perplexed by this.

Greg

03:57

Maybe this is actually one of the best indicators of where the economy is. Currently. I remember when I

had my first car what was your first car

- 3 -

Patrick

04:06

my dad gave me his Volkswagen Beetle 1969

Greg

04:11

The one that you raced in our control episode I said Yeah, right up. I had a 1976 Mercury Capri to it

was a hatchback. When I got it used. My mom got me these three seat covers, and she was so happy

to do that. So she got me these and she was going there sheepskin there sheepskin. Like, you know,

this was some big deal and like, okay, Mom, thanks. And so I put them on my car and they were very

comfortable. And I looked at the label, and it said sheeps kin so I never had the heart to tell her that it

wasn't actual sheepskin, but I thought that was like the best fake name ever. sheepskin.

Patrick

04:56

It's so funny. You say that because just a couple of weeks ago I tried to do my own car repairs when I

can. And with YouTube now and things that you can order online is within reason. I can do a fair

amount of my car repair, I needed to get this particular part for the record. I searched online and I found

genuine Honda replacement parts. Because I didn't want off brand, right I wanted it actually made by

Honda. And I was about to check out and it kind of struck me that it was half the price of what I'd found

in other places. And so I jumped in another tab and I searched genuine Honda parts. The company

name is genuine. It was genuine 100 replacement parts. Right. And I thought that was brilliant. I almost

wanted to buy it just because I so admired that it was a sheep's can. Yeah,

Greg

05:52

exactly. Well, so our topic for today is maybe in this spirit, I don't know, that remains to be seen whether

or not what we're going to talk about is sheepskin, or sheepskin. What we're going to talk about today is

something called partial least squares originally, when you were up visiting me over the summer, and

we were talking about topics that we were going to cover, we had said oh yeah, we can go through

these different estimators like ordinary least squares and maximum likelihood and to stage least

squares and partial least squares, and it was just sort of uttered in the same breath.

Patrick 06:23

It was gonna be in one episode. Okay, well, that's a whole other thing.

Greg

06:27

But it turns out partial least squares is not actually an estimator. Did you know this?

Patrick 06:32

I do now.

Greg

06:35

Yeah. So partially squares, which we're going to unpack a little bit here. The funny thing is, is that

there's a whole segment of the world that is just tied to partial least squares like it is their jam in the

early 2000s. If you go to the management information systems, literature, or other fields, like chemo

- 4 -

metrics, this is it right? We did a PLS, we did a PLS, we did a PLS in our world being primarily a social

science world. This doesn't cross our plate a whole lot. Exactly. And

Patrick

07:07

the reflection of that, at least to me is for you and me. Our day jobs are kind of centered around

structural equation modeling. Yes, I've been in the game for 25 years. And I thought it was a method of

estimation. I really did. And that is not it at all.

Greg

07:24

That's right. And in fact, you mentioned structural equation modeling. You and I we didn't just drink the

structural equation modeling Kool Aid. We went to Studio 54. And snorted structural equation modeling

off the bathroom counter and came out rubbing our nose so we're deep in

Patrick 07:42

Nope, you did I Coors Light beer bonded that's only half a joke, which is that's all there was right? It

wasn't like I could pick from this or this or this or this when you and I came up through the system.

Bollen's book came out in 89. And I use that book in my first sem class in 1990. Right, this was the only

game in town when I came up through the system. Now ironically, the foundation for partially squares

was developed years earlier. Right, it had not permeated at least my instructional system that I was part

of. And what I find fascinating is I think that's impart their literally continental preferences over this.

Much of the work in the applications are dominated in Europe.

Greg

08:43

That's absolutely true. Structural equation modeling and this thing called partial least squares that we're

gonna start getting into. Both of those weren't just born in Europe. They weren't even just born in the

same country, which is Sweden. They were born at the same university, who Uppsala University, as

coming out of a guy named Herman Wold Uppsala. Herman Wold was the advisor of Who do you

remember?

Patrick

09:15

Okay, I remember but you're gonna mock me for saying his name. So I'm just gonna say Carl, and then

you can say his last name because Tova taught you how to say yes.

Tove

09:26

That's right. Hey, this Toby Larson, Greg, Swedish coach and occasional quantitative linguistics

consultant.

Greg

09:32

So it's first name isn't even Carl, come on. Have you? Throw me a bone. All right, fine. Fine. His

American name is Carl. His stripper name is Carl. His actual name is Colgate que je and last name is

Jada scug. Yeah, the pronunciation is slightly off on that one.

- 5 -

Greg 09:49

So Karl Joreskog as we call him in the US was the student of Herman Wold and Karl Joreskog is you

know the father of the structural equation modeling that you and I practice that really He had it seeds

planted back in confirmatory factor analysis in the mid to late 60s, where structure was imposed upon

that latent variable system, and then started carrying off into the 70s. And it didn't just carry off on a

theoretical level. And I think this is really, really important. It carried off with software to, it's great when

there's the mathematics of these kinds of methods. But if you can't put it in people's hands, and there's

a problem, there was LISREL. And there were punch cards that went with LISREL, and mainframes

and all of that kind of thing. But when we think about how structural equation modelling got a head start,

the pieces were in place, not just in terms of the mathematics and the theoretical foundations, but

actually the software to be able to do it. And I think that's a critical thing. But you know, after structural

equation modeling started taking off Herman Gould sort of watching his academic child go off and do

great things, he sort of scratched his head and said, oh, boy, cool, gay. There's a lot of assumptions

wrapped up in your structural equation modeling thing in your list rel thing, and some of those things

make me uneasy, and we should probably just rattle off some of the assumptions that exist, you're

gonna pop quiz me on this, I don't mean it to be a pop quiz, but let's go 30 seconds

Patrick

11:17

within the SEM and what my understanding is of what Volt's reaction was kind of twofold. One is it goes

back to something that we've talked about on prior episodes, which is one of the key advantages of the

SEM is that it's a priori. And one of the key limitations of the SEM is that it's a priori. So we have

exploratory methods, we have DFA, we have Eigen values and Eigen vectors and rotation and all of

these things. And then if you've had that in a class people say, yeah, yeah, but you're letting the data

drive your decisions. don't saturate your factor learning matrix structured in a way that's consistent with

theory. And that's a huge advantage. I mean, massive, huge, huge, massive advantage of confirmatory

factor analysis in the SEM in general, except one, it's not. Some paraphrasing of volte is there are often

situations where we are data rich, but theory poor, and that we need a methodology that allows us to do

sem light things, but without being so shackled to a strict a priori parameterization of our model before

we ever begin.

Greg

12:29

So you'd said that there were two primary things, the first one then has to do with feeling locked into

this very confirmatory very odd, priori structured kind of endeavor. What's the other one that you had

rolling around some

Patrick

12:41

of the asymptotic regularity conditions for maximum likelihood estimation? So what he seemed to be

more concerned about was a rather strong assumption about multivariate normality, and that you have

a sufficiently large sample size where these asymptotic conditions of consistency, unbiasedness and

efficiency and asymptotic normal sampling distributions, maybe those don't come online, given the

characteristics of the data that we have in our sample? That's absolutely right.

- 6 -

Greg 13:08

And so his thinking was, could we do something that is, and this is where I really want to be careful of

my language. Can we do something that is an approximation to that? This is where some people's

hackles will get raised? Not PLS, people necessarily, but sem people, right? You and I had a whole

episode on principal components analysis and how like one of your first disclosures after the spider pig

theme, and

Patrick

13:34

spider fig spider fig.

Greg

13:43

After those disclosures, I mean, the whole episode really was about principal components analysis

what it is, and that it is not a factor model. And some people just completely get their underwear and a

bunch over that for sure. And there's a whole segment of literature from the 80s and early 90s, where

people were just butting heads over. Is it a latent variable model? Is it not a latent variable model? Can

it be used to estimate latent variable models, but at its core principal component analysis is just a

composite model. And it's not the only way that we have to form composites. And so Wold was sort of

wondering, can we bring to bear some of the ideas of compositing rather than the latent variable

methods that we have, but use it as an approximation to a system that otherwise looks a lot like a

structural equation model?

Patrick

14:29

And in that spirit, you know, whose ghost that I saw on my back deck as I was reading some of this

material was Harold Hotelling. Ah, nice member. He's local. He's in Chapel Hill, but he was at UNC

back in the 30s, but

Greg 14:45

not buried in your backyard.

Patrick

14:48

Nevermind Hotelling, who was the original developer of principal components analysis, never

addressed factorial rotation because he did not care about the numeric Recall values of the weights

from a substantive standpoint, he had a very practical motivation, which is he had a large amount of

data. And he wanted to reduce it to a smaller amount of data. And whatever those optimal weights

were, which came out of Eigen values and Eigen vectors, is that allowed him to get the composites, get

a cup of coffee, say hard to merge out front, and then go do whatever he was going to do with those

composites. Obviously, this is not principal components in PLS, but it has that spirit of Hotelling, which

is we have a problem to solve existing methods work really, really well, if we meet those assumptions

very often, we don't meet those assumptions. And this is a really nice alternative that allows us to do

some things that we wouldn't otherwise be able to

Greg

15:49

imagine that we had what you and I would think of as a structural equation model with some latent

variables. And we don't need to dig too deep into labeling them yet. But let's imagine that we had to

- 7 -

exogenous factors or constructs or call them whatever you want. And let's say one endogenous, that

depends on both of those. So in our head, right, we're asking you to visualize use your mind's eye. Both

of those have paths coming into the dependent or endogenous latent variable. And let's just say that

those two exogenous factors covary for you, in our practical people, and maybe worried about some of

the things that we're starting to allude to, we might say, why don't you just put a score in there for that

first factor? Why don't you just put a score in there for that second factor, if just put a score in there for

that third factor? And go do yourself a regression? I mean, why not? Right,

Patrick

16:44

exactly. It's a GITR done kind of thing.

Greg

16:49

Larry, the cable guy here,

Patrick 16:51

but there's an element to that. And I actually find you and me quintessentially practical. Oh, nice. We're

trying to achieve a goal we got an end game. We know what the high road is. What I mean by that is, if

we are able to use a multiple indicator latent factor if that latent factor is properly defined, if we have

adequate sample size, if we have adequate distribution, if the model is properly specified, that's kind of

the gold standard. But as the pls people talk about with the data rich theory poor and violations of these

asymptotic conditions, we need a way to achieve what our goal is. And this is a promising alternative

for doing that. Yeah, we got an endgame right. It's all Marvel Universe. This is like the STMS endgame.

Greg