GPGPU / GPU Compute

Question

GPGPU / GPU Compute

Evan White

Why is the GPU compute field so fucked up right now? I was reading into this, and it seems like I would basically have to have a card from each company if I wanted to tool around with the various kinds of GPU computing. This is a massive pain in the ass if I'm just working with a single machine. I need AMD for the stable open source driver, but all the machine learning libraries and shit require CUDA right now. So I'd need to pick up an Nvidia card if I wanted to do any of that kind of stuff.

GPGPU is really cool in theory, but in reality right now it's a swamp of proprietary bullshit. This is really bad news, since all the current AIs run on it, and they're taking over day-to-day life.

Well, GPGPU general. Have you been hacking anything with it lately?

September 19, 2016 - 08:55

Other urls found in this thread:

khronos.org/vulkan/
en.wikipedia.org/wiki/Heterogeneous_System_Architecture#Software_support
twitter.com/NSFWRedditGif

Owen Mitchell

It's not fucked up (if you're willing to lock your codebase into our proprietary language), goyim

September 19, 2016 - 10:51

Samuel Wood

No demand for open APIs.

The irony is that fixing this might mean writing software limited to Intel, because they're the only one shipping Vulkan and OpenCL drivers.

September 19, 2016 - 16:38

Ian Clark

How do GPUs work?
Why exactly are they better for some tasks than CPUs?
What would it take to make a DIY GPU?

September 19, 2016 - 17:49

Luis Robinson

luckily amdgpu will be getting Vulkan in open source soon, but I don't know anything about opencl support.

GPUs are just hundreds to thousands of weak, simple cores, with fast, high-bandwidth memory. They're made to crunch graphics, which is a highly parallel task, and as a result, they're good for other highly parallel tasks. Machine learning, crypto, and a few other tasks fit the bill.

September 19, 2016 - 18:31

Samuel Fisher

Because we have no Manuel without a NDA

Because we have no free/libre driver, except for nouveau, intel and the speudo blob amd "free/libre one"
(you can only use 2d acceleration that's all, no screen resolution, no luminosity control, you have nothing without the amd blob that is in the linux kernel)

Everything is patented.

We have no specs of hardware

We have nothing user that's why.

This is the proprietary world.

Welcome to hell user.

September 19, 2016 - 19:59

Julian Bennett

forgot pic

September 19, 2016 - 20:01

Michael Reyes

this is why we need a communist revolution.

September 19, 2016 - 20:16

Julian Robinson

Building a wall was a mistake

September 19, 2016 - 20:47

Hunter Rodriguez

Vulkan is designed as both a graphics and compute API, and OpenCL 2.1+ supports SPIR-V. Expect things to improve in the field very soon.

September 20, 2016 - 03:24

Nicholas Perry

How does the cooling work if the card in the next slot is blocking the fans?

September 20, 2016 - 06:18

James Mitchell

it's a retarded design has always.

Manufacturers are incapable of thinking out these things.

I pity the hardware.

September 20, 2016 - 08:21

Easton Barnes

A very narrow gap.

September 20, 2016 - 10:21

Adam Clark

really? sauce?

what's the significance of this to market share? are big-name companies really interested in it?

tbh more cooling is done by the rack fans and AC. still weird they wouldn't choose some kind of blower instead.

September 20, 2016 - 10:45

Angel Nelson

What the fuck do you mean source? It's in the god damn first sentence of the web page, it's not some obscure footnote in the spec.

khronos.org/vulkan/

September 20, 2016 - 11:06

Tyler Jones

sorry, I guess I just mean I don't know how useful it really is for compute. is computing a side thing for Vulkan, or could it actually compete with CUDA?

September 20, 2016 - 12:37

Luis Hall

To answer your question, OpenCL is characteristic of Khronos Group garbage. Cumbersome, badly implemented, depressingly low level and basic and with none of the mod cons you'd hope to get (to head off criticism, I'm not asking for anything outrageous; we're talking basic device information, features to help manage memory usage, etc). That said, it works fine (that is to say, GPUs work fine), granted, but it's crude and raw and you WILL be making a big ugly interop layer to talk to and manage it if you're doing anything serious. As is standard for Khronos, doco is basic and inconsistent and it's a pain to get into as there's no real interest in bringing people into the fold, so to speak.

Nvidia may be bad on a variety of fronts, but at least they've tried to be professional about the whole business of API creation. I can't say whether they've succeeded at that goal having not used CUDA or their advanced GPU computer APIs a great deal, but I think I do understand why they might've made CUDA; in an attempt to try and formalize and standardize the expectations programmers can have when entering the new field GPU computer represents, and to set a bar as the industry standard.

That said, as I understand the basic GPU hardware interfaces haven't changed; they're still as crude as they every were. The software has got a bit smarter, but one only needs to take a look at how awkward the business of resolution switching and monitor management still is to tell, and this is true of both vendors. This is still 'dumb' hardware (monitors, too).

Personally, I think we'll see the same mismanagement of this field as was done to HTML in an attempt to rectify this situation without doing the hard work that it'd take to build a smarter device. More bandage fixes, more 'smart' layers to wrap and manage the raw levels no-one serious in the field wants to deal with and more industry stalling as they argue over standards.

In fairness to Nvidia, the 1080 runs really fucking cool. I've ran Furmark on full at the one we got at work, and it didn't break a sweat.

September 20, 2016 - 14:58

Dominic Ramirez

Anybody played with PyOpenCL? Looking at the documentation, it appears that it takes a lot of the unnecessary complexity out, but I haven't had much time to go through the setup.

September 20, 2016 - 15:21

Austin Davis

Vulkan is graphics and compute from the ground up, compute is a first class citizen with just as much power as OpenCL or CUDA.

September 20, 2016 - 23:40

Jackson Jenkins

Bump

September 21, 2016 - 21:56

Lucas Ramirez

First mover's advantage on Nvidia's part. They made that shit usable first and managed to lock all early adopters in to their proprietary ecosystem.

September 22, 2016 - 13:40

Eli Bailey

Are you retarded? Manuals are freely available for all recent Intel and AMD GPUs.

September 22, 2016 - 14:00

Isaac Garcia

Can Vulkan do compute and graphics simultaneously in a way that CUDA or OpenCL can't? If so, wouldn't that make it great for computer vision?

September 22, 2016 - 14:18

Kayden Howard

OpenCL can operate together with OpenGL and even Direct3D through standard interop extensions.

September 22, 2016 - 14:28

Levi Sullivan

See those four big fans at the top of the first picture? (and the three less-visible ones on the second pic) They force air through all the cards' radiators at the same time. Once the empty space in the middle is blocked possibly with more cards and the case is closed, this is the only way the high-pressure air can exit the case. Any fans on the cards themselves are completely useless.

This is standard rackmount server cooling setup, by the way.

A blower would suffer from the same lack of space to suck air from, and on top of that it would block airflow from the case fans. The side fans at least don't hurt cooling.

If the makers of that ghetto rig weren't such cheap-ass faggots, they wouldn't use el-cheapo GeForces but the professional Tesla compute cards, which are designed for precisely this kind of rackmount cooling setup. They don't have any fans at all or any video outputs to restrict airflow. Just one huge radiator all the way and one huge vent through the entire rear bracket.

September 22, 2016 - 15:13

Oliver Perez

Teslas cost a fuckton and have less CUDA cores than gaming cards for some reason. Do their compute benchmarks justify the outrageous prices?

September 22, 2016 - 15:22

Ryder Garcia

Depends on what kind of compute are you doing. They have all their double precision units enabled, so will crush the GeForces in anything DP-heavy or full-size integer multiplication heavy, since that goes through the DP units too. They also have compute-optimized firmware that reportedly can communicate with the host CPU (and other GPUs) at much lower latencies. This is probably done through intentional sabotage of the GeForces for market segmentation reasons.

That, and they are designed for 24/7 operation in a rackmount case. Without any ghetto jury-rigging.

September 22, 2016 - 15:40

Luke Bennett

Yes, that's part of the reason why it's so low level.

It takes 600 lines to write a hello world application in vulkan.

That wasn't an accident, you get a lot of flexibility with Vulkan.

September 22, 2016 - 16:57

Ryder Adams

The problem with CUDA and OpenCL is that they are not portable, CUDA only runs on nvidia cards and each version breaks compatibility with the last, openCL is supposed to be standard but only the amd linux version of them works properly.

Vulkan is portable, and that is probably good for computer vision, you could run OpenCL code compiled to SPIR-V on any phone supporting Vulkan.

September 22, 2016 - 17:25

Julian Hernandez

That's not OpenCL's fault, though. There's no technical reason a proper, well-performing OpenCL driver couldn't be written for other vendor's cards. Only lack of will and/or manpower to make that happen.

September 22, 2016 - 18:42

David Adams

Like a very shitty very multicore CPU
Because it has thousands of cores
Silicon manufacturing facilities and proprietary knowledge, only a handful of companies has those

September 24, 2016 - 17:14

Adam Thomas

EasyCL exists, you know.

September 25, 2016 - 22:32

Jose Thomas

Why use a GPU when you can just get a massively parallel architecture cpu running a legit OS?

October 7, 2016 - 14:32

Jaxson Ortiz

Vulkan on phones is going to be a toy environment for the next 5 years or so. None of the drivers comply to spec, just like they can't fucking do OpenGL correctly today.

October 7, 2016 - 15:26

Dylan Lewis

You can cool whole server racks.

October 7, 2016 - 15:59

Mason Bailey

liquid*

October 7, 2016 - 15:59

Hunter Ortiz

Because a GPU-style architecture can cram more performance into the same number of transistors.

October 7, 2016 - 16:18

Hudson Davis

What about After Market software? Can it be pulled off that way? Is there a large enough market for such a software?

October 8, 2016 - 19:14

Lincoln Allen

But only on embarrassingly parallel code/problems. Outside of that you are best on an old-school cpu design.

October 8, 2016 - 20:09

Isaac King

Precisely. Which is why we have both a CPU and a GPU in most modern computers.

October 8, 2016 - 20:26

Jordan Rivera

Surprised no one has mentioned arrayfire. I mentioned it in another thread. It makes GPGPU programming pretty simple, and its easy to switch between CUDA, OpenCL, CPU, all sorts of stuff. Arrayfire does all the heavy lifting. I've never used it, but they claim to have highly optimized code. It has a 3-clause BSD license.

October 8, 2016 - 20:59

Nolan Ross

These claims are spurious as fuck.

October 9, 2016 - 19:32

Carson Nguyen

Fucking 8ch ate my image.

October 9, 2016 - 20:02

Robert Nelson

There is a GPU (of a sort) in basically every PC now. Check it out. Intel devotes as much or more silicon area to their GPU as their cpus on modern desktop processors.

It's just like the 387. The gpu is a co-processor and is going to be/has been integrated on die as the semiconductor processes get better. I think AMD is ahead in that integration (fwiw).

The REAL question is are these co-processors going to continue to be the really weird warp style of execution with a completely different u-architecture, or will something like Xeon Phi co-processors - that have a similar uarch but are effectively standard asymmetric smp - be better.

It is going to be interesting to see how well gpu style workloads scale on the Phi, and if they are power efficient.

The other question is if new workloads/algorithms will be created that pull the gpu more in the direction of general purpose processors. That has already happened, but how far will it go?

Maybe SPIR/Vulkan makes having a similar uarch irrelevant. But I would think that radically different gpu styles are going to have to be optimized for differently, which will be a pita.

Is the challenge for the Phi is to figure out how to map the warp execution idea to a very wide SIMD? Or is it hopeless - would warp divergence just make the Phi's SIMD units useless?

Lets say you have a procedural texture that causes lots of divergence. How many independent execution streams can a AMD or nvidia gpu sustain?

October 9, 2016 - 21:39

Julian Hughes

True user, so true. Why play vidya on machines with GPUs? Waste of good money that could then be invested more in purple-haired art grants tbh.

October 10, 2016 - 11:38

Grayson Richardson

CUDA is shit.

October 12, 2016 - 19:16

Cameron Hall

If you are looking for locking yourself
try to learn AMD GCN asm

October 12, 2016 - 19:17

Josiah Myers

In asm you'll find all performance gains that you want.

October 12, 2016 - 19:17

Nathan Wright

Why do you want to program CUDA?
Are you a Jœw promoting envidious behavior?

October 12, 2016 - 19:19

Lucas Johnson

They're noob.
Locked in their fucking EULA.
Once I read a M$-EULA and I got cancer.

October 12, 2016 - 19:24

Jack Roberts

>CUDA is da shit.
FTFY

October 12, 2016 - 19:37

Camden Gray

Anyone here try the amdkfd GNU/Linux kernel driver in conjunction with the HSA options in GCC? This is the sort of thing I'd expect Phoronix to do benchmarks on but it's still in very early beta.

en.wikipedia.org/wiki/Heterogeneous_System_Architecture#Software_support

I'm more interested in seeing how much of an improvement it'll provide to programs not designed for GPU computing than specific rendering or hashing programs. Having a co-processor with gigs of fast memory might finally make sense for applications other than gaming or video.

October 12, 2016 - 19:53

Jackson Kelly

I'm not know what do you refer to GPU computing.
Almost any algorithm can be parallelized.
There are memory transfer bounded programs and computing speed bounded programs.
Almost any application can be transformed between memory intensive or computing intensive.

October 14, 2016 - 17:47

Adrian Anderson

Because AMD GPUs and Nvidia GPUs are optimized for different things, AMD have better integer throughput while Nvidia have better float throughput. When you are dealing with GPUs you are already writing quite specialized code so you may as well specify the most appropriate hardware.

Thats because of the above, machine learning is quite float heavy and AMD cards can bog down when they are loaded up with those kinds of computations. Power efficiency plays a big role in this area because of the hardware scales involved so you don't want your hardware wasting energy performing computations its not optimized for.

Also AMD drivers for a long time haven't had the stability demanded by enterprise applications.

Machine learning is not AI.

Company I am involved in specializes in GPUs, we were at GTC Melbourne a few weeks ago.

They have lots of cores which perform the same instruction on lots of data sets at the same time, this is known as SIMD, or Single Instruction Multiple Data.

They are extremely powerful when your program iterates the same instructions over a large data set, its why a shitty mobile Tegra GPU has more compute than an i7 despite only consuming 1/10th the power.

Any time you have a piece of code which sits in a loop iterating over different pieces of memory (for instance, you have a loop which adds 1 to every value in an array) then they are very good, essentially with a GPU you can perform the operation to all the values at the same time.

You will kill yourself before you get anything significant working.

October 15, 2016 - 05:32

1 2 ... 6 Next

GPGPU / GPU Compute

Last threads