Intel’s parallel processing vision | |
![]() |
| | LinkBack | Thread Tools | Display Modes |
![]() | ![]() |
| | #1 | ||
| Feed Meister Join Date: 26-06-2007
Posts: 16499
Thanks: 0
Thanked 8 Times in 8 Posts
|
We were privileged to be granted an interview with Intel vice president and director of research Andrew Chien. We asked him about the main difficulties facing the full adoption of parallel computing as an efficient method of processing, and how Intel was tackling them. We also wanted to know how the new approach of offloading traditional CPU-processed functions to separate dedicated chips fits in with Intel’s plans. TechRadar: Will true parallel computing require an overhaul or complete replacement of x86 architecture? Or is x86 up to the job of delivering true parallelism? Andrew Chien: It’s safe to say that x86 is deeply rooted in a whole range of software compatibility and tool chain aspects, which is the glue that binds the industry. When you look at parallelism, it’s almost orthogonal to single-processor instructions – there are some issues in how you get things together, but there’s a whole new dimension of challenges. Intel fully expect that a full range of parallel systems will be built based on x86, or extensions of x86, and that they’ll work just fine. Having said that, does x86 present some special advantages in leveraging in that space? Other than for massive software, we don’t see that as the primary issue. TR: Intel’s x86 architecture has been around for over 30 years now. Have some inherent difficulties materialised in that time, and have you had to work around or add extensions to prolong its life? AC: Oh, totally – but every instruction set architecture evolves from the time it’s introduced, and Intel has continuously been adding improvements to the architecture. Most of the academic world has focused on this RISC versus CISC competition – in some parts of the academic community there’s a deeply held belief that RISC won, whatever that means. But the interesting thing is that when you move into this mode your frequency is limited due to power – and we’re seeing that now. Intel has changed its strategy – there’s a lot of innovation around being more efficient with the instruction set from a power point of view. So a lot of the extensions and enhancements that you’re seeing coming out of Intel now are about that ability to describe a bunch of computation with more complex instructions. This allows you to achieve higher-power efficiency. TR: Is this efficiency of power and the limitation on clockspeeds pushing the multi-core aspect…? AC: Absolutely. TR: But beyond multi-core, are they pushing operations off the CPU and onto the GPU or are they using other discrete media processing hardware? AC: I think that it certainly pushes modifications to the architecture extensions that could increase power efficiency or single-thread efficiency in a single stream. The power offload is also a fundamental driver behind parallelism using multiple cores, because it’s been known for 30 years that you can get more OPS per watt if you go parallel and don’t scale the clock as fast. This desire to offload stuff to special-purpose engines was first seen with engines for cryptography and similar things. There are engines in small mobile devices such as your voice recorder – they have custom ASICs or custom designs bred to do the media encoding and decoding. Some of that is for cost, but some is because a hardwired design is more power efficient than a general purpose core. I wouldn’t really hold up GPUs as being any more power efficient, but I think that there’s a spectrum of general-purpose designs all the way down to hardwired implementations for power efficiency that Intel is very conscious of. We look at integrating things into our SOCs [System On a Chip] and other kinds of products like that to address those markets. TR: You highlight the System On a Chip as one idea, but the path that ATi, Nvidia and Toshiba are taking suggests that more media functions can be processed by the GPU, and GPU-supporting media processors. Don’t you see Intel’s discrete graphics Larrabee technology processing as more than just graphics? AC: It would be disingenuous of me to say that it’s not a target for other models of computation. It’s probably easier than some of the existing solutions out there, because it’s likely to be a much more flexible implementation, knowing what kind of systems Intel builds. But I really view that as different to the stuff you’d find in SOC or in hardwired kinds of specialised logic. The excitement of all of those GPUs is that they’re all the 100W discrete package kind of thing. They’re not penetrating the iPhone markets; they’re not even close to doing that, being several orders of magnitude off in power use. They’re very interesting as accelerators and in systems that do computing at a high-end and a high-power envelope to deliver a large amount of FLOPs in a single chip. I think that’s compatible with discrete graphics’ positioning, but with maybe just a broader software platform. And things that Intel is doing in that space, I expect to be quite competitive. TR: Do you see Larrabee moving into the integrated solutions sector and being married to lowpower CPUs to produce lowpower, small-profile computers that could compete with the performance of AMD’s Puma platform? Do you also see Larrabee featuring in smaller and smaller platforms as the production process allows for even lower power and smaller products that could feature in smaller products down the line? AC: Larrabee is currently defined as a large core count, high-end discrete graphics solution, but there’s certainly a possibility to integrate it at some point. I think the choices are more complicated when you’re talking about a high-end discrete graphics platform because you’re talking about maximum performance in a high-power envelope. When you move down to the integrated versions, there’s a more subtle trade-off between how much power and how much silicon area you’re willing to pay for – and the performance levels you’re trying to achieve are always a compromise. For example, if power is the first rule of limitation, just by being on a package you’ve probably cut your power budget by about a half – or even more, depending on what your budget allocation is. A low core count Larrabee-like product is one obvious approach, and it’s not a hard thing to do. We own all the designs, all the validation software and all the tool chains; or we have a whole GenX series that is extremely power efficient and driven by the kind of constraints we’ve just been talking about. The possibility that these kind of ramped-up technologies are in some ways more competitive suggests that it may not be a black or white kind of discussion. TR: The hardware developed for multi-core operations has far outstripped the rate of software development. What is Intel doing to get the software companies to code applications and operating systems that step up to the potential of the hardware? AC: In the long run, what we’re talking about is putting the software industry on a different basis to scale for performance. People need to write code that’s fully scaleable and, frankly, we need research breakthroughs to put the whole industry on that basis. We call upon governments as well as other players in the industry to invest in the five- to 10-year future of scaling on parallelism in the industry. For parallelism to be successful, we need to move to a world in which people write code that is parallel; and that won’t necessarily just get one more element of performance for every core that is added, but will get faster every time. That’s the long-range view. Intel is doing everything it can to create the urgency. I think one of the interesting challenges for parallelism and all application software is the question of where it comes from. Every time we’ve had one of these major changes in [processing] capabilities, often the largest consumers of the [processing] cycles turn out to be new applications. We’ve been making people aware of a new class of workloads called RMS – Recognition, Mining and Synthesis. It’s all about data streams, analysing large quantities of noisy data, finding insights from them and synthesising the whole graphics and 3D visual experience. Those applications have staggering amounts of parallelism, so if those kinds of capabilities become increasingly part of other applications, that alone could possibly saturate many of these parallel processors. [IMG]chrome://seoquake/content/skin/close.gif[/IMG] [IMG]chrome://seoquake/content/skin/close.gif[/IMG] [IMG]chrome://seoquake/content/skin/close.gif[/IMG] More... | ||
![]() | |||
| | |
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
| |