New IBM Super Computer

Evan · Jun 10, 2008

MAKG said:
Speed. Not speed records. That computer would be useless for financial transactions written in COBOL. A traditional ethernet-linked weakly bound cluster would make more sense for that.

But it ain't gonna win records.

And I don't think you're understanding the scale. A million transactions per day can be done on MUCH slower computers. That thing calculates around a trillion fluid cells per day (based on the order of 1000 flops per grid cell). With a 1000^3
grid and 1000 timesteps, that gets you in the ballpark for some types of nuclear explosions being calculated in a few weeks.

I don't know if you understand the amount of processing power a modern business requires. It can be stunning. Ethernet-linked cluster? Yeah, right. Maybe in the 90s. Fiber-channel is a must nowadays.

But you're right, raw speed is fairly meaningless in this situation. Data transfer and manipulation are key. Code efficiency determines how well you can do this more than processing speed. Many of our large operations take days to process, and the CPUs aren't even pegged; the busses and SAN are the bottlenecks. If you're good at optimizing large relational databases you get hired for 100k+ a year, no questions asked.

MAKG · Jun 10, 2008

Bob Ayers said:
Why don't you give it up!!! You act like the rest of us don't know what a cache is, and all your doing is throwing around buzz words you have heard!!

Umm, Bob, I have a Ph.D. based on massively parallel computation. I have hand optimized code for the platform. It's quite a lot more than just throwing around buzzwords.

The mistake you and Evan have both made, and it's extremely common, is confusing parallel with distributed computation. They are VERY different. Business applications are distributed. If you use distributed paradigms on massively parallel hardware, you really don't get very far.

If you want me to get into technicalities, the way you get peak speed out of a massively parallel machine is (a) to make ALL the processors compute simultaneously (it sounds obvious, but it's VERY hard to do this with the types of functional parallelism one does in business frameworks -- you simply CANNOT do the usual RMI/DCOM/CORBA/whatever thing on it, and virtually have to resort to data parallelism), and (b) hide the network latencies with computation. (b) determines your problem; you can only do it with very compute intensive calculations. Every time you perform a reduction, barrier, or any other synch, you risk blowing your scaling.

Oh, and (c), stay off the FN disk. A single disk is still some 6 orders of magnitude slower than an in-cache multiplication, and you're not going to make that up with even perfect parallelism. Amdahl's Law really bites you in the behind here.

I'll say it again. These are not general purpose machines, and they do not mean broad improvements everywhere. They MAY not mean broad improvements anywhere, though there is probably a specific application in mind (say, radiation transfer) in the design.

Evan, your comments indicate it really hasn't changed since I went into distributed computing (yep, I do that now). There has always been a performance hierarchy, and while it has narrowed somewhat, it's still several orders of magnitude apart. And its still the network and synch delays that mess you up. It's been that way for decades.

Evan · Jun 10, 2008

MAKG said:
Business applications are distributed.

Yup, for the most part they are. Businesses are implementing more and more parallel processing however. Some of our boxes are running as many as 32 CPUs in parallel, with each box part of a distributed system.

And like you said, load balancing is the real challenge here. If your load balancing logic is insufficient, you've got 31 processors out of 32 doing nothing.

MAKG · Jun 10, 2008

Evan, for scale, my thesis code ran on 512 and 1024 processors. The modern machines are substantially larger than that.

And most of the runs were 10,000 CPU hours or so. Which is great if you have 1000 processors all doing their thing properly, but ridiculous otherwise.

With 32 processors, you can still do master/slave paradigms. You can do that with more, but the master has to be VERY thin by the time you get 1000+ slaves. I did it with true data parallelism; there was no master and all the algorithms were rigorously balanced (it only worked on a homogenous system). I had some real fun implementing a huge Fourier transform in a rigorously load balanced form.

It's not JUST load balancing (though that is certainly very important). It's also controlled synchronization points. It's not load, but if comm synchs aren't rigorously balanced, it can be considerably worse than load balancing. The canonical example is data-dependency serialization; all the processors have the same workload (so it's balanced), but only one can run at a time.

Evan · Jun 10, 2008

Yes, multi-CPU business systems work on a master/slave basis. My use of the word "parallel" was incorrect. It's just about impossible to hemogenously process business logic.

The master/slave approach, while much better for a business than a parallel environment, is flawed. It works well at times but other times you'll see one CPU pegged, chugging along, and the next one idle. Unless you're a Linux god (and the system is Linux), you can't go in and optimize the environment, as it's part of the kernel. The code I write deals with load balancing within a distributed system. Much easier as you're dealing with "chunks" of data which do not need to be uniform.

New IBM Super Computer

Evan

Well-Known Member

MAKG

Well-Known Member

Evan

Well-Known Member

MAKG

Well-Known Member

Evan

Well-Known Member

Sponsored Ad

Sponsored Ad

TRS Events

Member & Vendor Upgrades

Latest posts

Recently Featured

Ranger Adventure Video

TRS Merchandise

Follow TRS On Instagram

Latest Album Photos

TRS Sponsors

Sponsored Ad

Sponsored Ad

Amazon Deals

Sponsored Ad