FPGAs, Security, and Hammerblade
An Interview with UC Santa Cruz Professor Dustin Richmond
Q: Thanks for agreeing to meet. We're excited
to hear from you.
A: Thanks for inviting me.
Q: Yeah. Let's go ahead and get started. And
just with like a general introduction, who you are and speaking shortly about your lab
and your work and like what the main thrust of your research is centered around.
A: So introduction, I my name is Dustin
Richmond. I'm an assistant professor at UC Santa Cruz. I joined UC Santa Cruz in August
2022. Before that I was a postdoc at the University of Washington and then did my PhD at UC San
Diego. I've been involved with open source hardware since I was a wee PhD student back
in 2012. My first project that I was involved with was called Rifa, a reconfigurable integration
interface for FPGA accelerators. That was one of the first solutions, if not the first
to integrate to basically provide an end to end solution for FPGA accelerators that was
open source. So the Rifa solution supported PCI communication through a streaming interface
from Windows, Linux, MATLAB, Java, C, Python, onto the FPGA. And I've pretty much gone on
from there. How do we create usable, reusable hardware solutions for everybody, not just
for engineers or researchers, but also people who are, you know, do it yourself enthusiasts
at home. I've been involved on a variety of different projects since then, but they've
all been about the same. How do we make it easier for other people to, to do what I like
doing on a daily basis. For current research, I'm involved in a couple of different things.
One is it's the open source tools are getting easier and easier to use. So how do we teach
with those open source tools? What are the benefits of teaching with them, which is something
I talked about, was going to talk about latch up. I'm still submitting a video to John on.
In the past I've worked on languages and solutions to represent hardware. My interests though
are pretty broad in the last few, last couple of papers I've published. One was on building
an open source, high performance architecture, basically an open source GPU called Hammer
Blade. That code can all be found online through GitHub, BaseJump, STL, Blade Runner, all part
of Bespoke Silicon Group up at the University of Washington. The other work that I've been
working on recently isn't so, isn't so open source focus. It's more focused on hardware
security, but the tools that we use are all, the results that we develop are all open source
because the best, the best security work provides open source solutions as well to demonstrate
reproducibility as well as demonstrate reproducibility of the experiment, the data itself. Sure,
sure. The data, the stuff that I've been working on is all open source, but it varies a lot
from languages and architectures all the way down to hardware security and even to this
point, physics, which is odd. It's very broad.
Q: Yeah. So it sounds like you're referencing, and I wanted to ask you about some of your
recent work with multi-tenant FPGA security. With security challenges for sharing FPGAs in the cloud. Can you talk a little bit about that work? I think it sounds super cool.
A: Sure. So there's actually two different projects in that. One is a work we published at FPGA
this year and was nominated for a best paper award. Turn on, tune up, listen in, side channels
in FPGA hardware. And the work there was looking at, there's been a lot of work recently in
examining multi-tenant scenarios in FPGAs and how information could leak from one tenant
to another. And a lot of the work has said, oh, yeah, you just put a power sensor on the
FPGA and you're good to go.
There's really no need to optimize beyond that.
I think our paper was one of the first to look at it and say, no, no, no.
There's actually a real mathematical underpinning to a lot of this.
And you need to find the space where the side channel is maximized, the information channel
is maximized. So we studied that on two different boards, one in a local lab and then the
other on the FPGA in the cloud and demonstrated that, indeed, these security threats are real,
but also that there's a lot more than just popping a sensor on an FPGA and calling it a day.
There's an optimization problem there.
More recently, we actually used the same sensor to do something completely different.
So that work focused on two tenants who are co-located on an FPGA.
And to our knowledge, there's no commercial system that does that at the moment.
You can't rent half of an FPGA and be co-located with somebody else.
The security implications of that are pretty broad.
It's hard to say that that's a good idea unless you don't care about security.
So one thing that does happen is that it's possible to share an FPGA temporally.
So somebody else uses an FPGA and then you get to use the FPGA.
And one of the questions that's not been answered completely is,
is it possible to leave data behind?
And what we've been looking at is how using an FPGA, you actually cause
transistors in the FPGA to degrade.
And in particular, we looked at the routing.
So through this degradation, you actually leave evidence of the data behind.
And that evidence of that data could be AES keys or it could be machine learning weights
or it could be some OTP key stored in a root of trust.
And we use the same sensor, actually, that we developed in the PowerSide channel attacks
to measure the degradation in these routes.
So the degradation and then oddly the recovery, which is actually a more serious concern,
that allows you to recover what information was left behind by a previous tenant
completely unintentionally, just through transistor degradation.
Q: So is that if you understand the type of design that they were using on the FPGA prior,
that if you have that information, you're able to kind of reconstruct
the data that they had left behind?
A: Yes, exactly.
So if you know the routes, at the moment,
at the moment, it's a lot more people.
If you know the routes that were carrying sensitive information, which is,
there's a variety of ways that you can construct that scenario to be realistic.
You can recover the information that they left behind.
Q: Very cool.
You also mentioned the work on HammerBlade.
Can you talk about that briefly?
A: Sure.
So the HammerBlade project is a DARPA, I named our DARPA grant,
funded through the University of Washington, collaboration with Cornell,
that I worked on as my postdoc.
Its goal was to build a,
its goal is, still exists, the project still exists,
to build an architecture that,
if you really want to be tongue-in-cheek about it,
to build an architecture to end all architectures.
But in a less ambitious sense,
build an architecture that we can use to process changing data.
And that could be changing in time,
that could be changing in structure,
or it could be, you know,
there's different parts of the algorithm that require different properties.
And so we built HammerBlade itself.
The project is several parts,
but the project that I worked on was the HammerBlade MiniCore,
which is a MiniCore architecture that's designed to be performant
as the number of core scales.
And that's really been a challenge in computer architectures.
Yes, you can build a core architecture,
but you can't build a MiniCore architecture.
Yes, you can build a chip with thousands of cores on it,
but how do you actually get performance out of those thousands of cores?
So we built this chip, this architecture,
it's RISC-V processor,
built around the idea of having cells, local communication clusters,
and then scale out, not just the cores,
but the communication clusters themselves,
so that you get better performance as the number of core scales.
Q: You've also done some recent work on Network on Chips,
both your Ruche network stuff and the NoC Symbiosis.
Can you talk a little bit about that too?
A: Yes, so both those were sort of sub-projects
or findings within the broader HammerBlade project.
So if you want to have cores that need to communicate,
you need a network,
you can do it with a bus,
but that's not going to scale to thousands of cores.
So you need a way to provide communication between all those cores.
And the simple traditional way of scaling out cores
is using a mesh network on chip.
So you have local nearest neighbor connections
between your adjacent neighbors.
And there's sort of two ways,
two challenges that you can look at in that.
One is that nearest neighbor connections
don't provide you very good properties of communication
when you want to go a long distance.
So it's very hard to communicate with anybody
outside your nearest neighbor.
And that has major implications
for how you program an architecture as it scales.
So the Ruche network work was looking at
how do we create a network that provides better
long distance communication and bisection bandwidth
while still providing the benefits of a mesh network,
which are it's really easy to design a mesh network
and then stamp out thousands of cores.
So in the Ruche network,
we developed a way to basically create whatever
in times where n is your Ruche factor bisection bandwidth
as you, with no cost and area overhead,
which is a pretty massive benefit.
No area overhead, no design complexity,
just replication of tiles.
NoC Symbiosis is a similar thing.
How do we provide communication at relatively low cost?
So I think as chip designers, we like to make hierarchies.
We like to think of taxonomies.
And this is good when we're showing block diagrams,
but it's actually can be detrimental sometimes
when you are designing a chip.
So if you want to box your chip up
in nice little boxes and planet, that's all well and good.
But that ignores the fact
that there is often symbiosis between different parts.
I say symbiosis because I'm alluding to something
I'm gonna talk about in a second.
So for example, the floating point unit
and a processor doesn't have a need
to do a lot of long distance communication.
So it's not gonna be talking to the next core over,
but the network on chip will.
Whereas the network on chip,
the router doesn't do a lot of local communication.
So it doesn't have a lot of transistors.
And those transistors generally aren't connected.
They are, but they're not as connected
to as like a floating point unit.
And if you think of the chip, not just as a floor plan,
but as actual three-dimensional space,
these are actually using different parts
of the three-dimensional space.
The router is using, the network is using a lot of wiring.
That's very high up.
And the floating point unit is using a lot of logic
and wiring that's very low down.
And if you box those into separate things,
you're making very poor utilization of the thing
that the other thing uses very well.
So if you combine the two,
you can actually get great utilization of the entire,
not just floor planning area,
but the three-dimensional area.
By combining the two,
it actually reduced the area of your overall processor
if you were doing that previously.
Q: So moving on from research,
we have just like a couple general questions
that we've been asking around to people.
First would be, you know,
it's getting harder to count on dice drinks
for computational advantage, as you know.
So we're gonna need more clever architecture tricks,
so to speak, in the future
to continue to advance the field.
What do you think, or what do you see
as one of the next great frontiers
in computer architecture?
A: Chiplet integration.
Q: Like on-chip fabrics and multi-chip modules
and things like that?
A: Yes.
Um, I think that chiplets are a ripe opportunity
for just about everything in the hardware stack.
It hasn't really been touched
by the open-source community extensively yet.
One of the interesting things is that chiplets
and sort of interposers don't require
a lot of advanced technology nodes.
So for example, you could tape out an interposer in Sky 130.
Somebody may hold me to that and say that's wrong.
It's not something that requires a ton of,
you know, taping on an interposer
doesn't require a ton of design.
In principle.
But at the same time, it gives us this opportunity
that we've never really had before.
So you talked about how there's less opportunity
for die shrinks, but there's also opportunity
for integrating modules
that we really couldn't do in the past.
With this focus on making transistors smaller and smaller,
we always made big monolithic chips
and we stuck to, you know, a 14 nanometer process
for the entire chip.
But with chiplets, we can start integrating,
you know, older nodes or nodes that are better
for analog electronics or integrating things
like phase change memory into our chips
that we really couldn't do before
because the process just didn't support it.
So there's a lot, not just the opportunity
to think about how we build architectures,
but how we build systems that integrate different pieces
that have different advantages,
that are more efficient in one area
or save us energy or provide us performance in another.
And that's really because we can integrate
just chips together, finally, in a smaller package.
Q: Great, love it.
Okay, and then the open source hardware community
is still obviously very young.
You could view it as perhaps
like the open source software community
in the late 90s to the early 2000s.
Where do you see like the open source hardware community
and like RISC-V going into the future?
Do you think it'll largely remain like an academics area
or for startups that they have this advantage
of like this low barrier to entry?
Or do you think, do you foresee it
expanding out into the industry?
A: I think it already is expanding into industry.
I think companies like Google
have really done a great job
in supporting the open source hardware community.
They have driven a lot,
they and others have driven a lot of the development
of the open source hardware tools
in the last say five years,
which really didn't exist five years ago.
It barely existed,
but it's really become a fully featured tool
in the last four or five years.
I just recently taught a class
with using the open source tools,
which I couldn't have imagined
even two or three years ago.
I don't think that's going to stay that way.
I think people are going to realize
that the open source tools are good.
And if you contribute back to them,
they can only get better.
So I really hope that
it doesn't stay isolated to academia.
So even the changes that I've seen
in the last two or three years
have been so impactful for how we design chips
or how we teach students
that I really can't imagine it staying in academia.
Gotcha.
You can do formal verification
with an open source tool flow.
That's pretty cool.
That would not have been possible five years ago.
And you normally pay hundreds of thousands of dollars
for that from a CAD tool vendor.
I really hope that even at least as a first pass,
as a first pass,
regression checking or something,
we start to see a lot more adoption
of the open source design tools.
Q: Very cool.
And then we'll close out.
We have one kind of like a little fun question.
Do you have a favorite open source license?
A: I mean, is there any answer but BSD3?
Q: That does seem to be a common answer.
A: I like BSD3 a lot,
but I also respect there's quite a bit on there.
BSD3 doesn't focus so much on software.
Or so it doesn't focus so much on hardware.
And I think that there are
some good open source licenses.
There's been one modified Apache license for hardware.
I think maybe that is solder pad,
but there's also a solder pad license,
which focuses on more on like,
it's not a binary, it's a bit stream or it's a...
Just a little more task specific.
And I think that those, you know, BSD3 is great.
It's great for just code,
but it's also not the end all.
It's not gonna be the perfect license ever.
Q: Thanks so much for hopping on a call.
We really appreciate you taking some time to talk to us.