# Control for High Heat Chips' Cooling Based on Power Consumption and Temperature Signals

Jian Wang School of Electronics and Information Engeering University of Science and Technology of Suzhou, USTS Suzhou, China wangjiansuzhou@sina.com

*Abstract*—For CPU chips' cooling task, existing mono-variable control system has the defects of delay, fluctuation and high energy consumption. With the trend of multi-core and high frequency work pulse in CPU chips, traditional system does not fit for future cooling task. Several approaches such as chips' dissipated power nonlinear feed-forward, core temperature tracking and surface temperature feed-back are applied to improve control system. Simulations prove that comparing with the traditional out-core temperature based mono-input-output proportion tracking strategy, the improved system can bring better quality and energy-saving results. Also, the new system has parameters adaptability.

Keywords-CPU chip; temperature; cooling; control; power dissipation; usage rate; feed-forward

# I. INTRODUCTION

In future space mission, more and more high heat integrated circuit chips need to be cooled [1]. Promoting the quality of high heat IC chips' cooling control is just a new study focus. CPU chips are a typical kind of high heat chips and their cooling control is the key of guaranteeing their stability and dynamic properties [2]. Traditional cooling control systems are mono-variable tracking system, and they apply proportional algorithm as control strategy [3-5]. That is, the discrete values of out-core temperature in chip are multiplied by a coefficient and then are used to adjust cooling actuator's driving power.

CPU chip is a body with inside heat sources. Its heat generating mechanism [6-9] makes temperature in chip is a nonstationary random process with delay and many influence factors. This is the bottle-neck for traditional control system to fit for future CPU chips' cooling task. Concretely speaking, uncertain deviation between out-core temperature measuring Chuan-yang Liu

School of Electronics and Information Engeering University of Science and Technology of Suzhou, USTS Suzhou, China

values and real temperature [7], single system input variable and simple tracking control strategy, and so on, are main reasons. So, it is urgent to find new control system to cope with future challenges. Unluckily, trends of more cores and higher frequency of main pulse of CPU chip, future scheduling strategy of operating system, development of CPU internal structures and techniques, even increasing of cooling actuator's efficiency, will make CPU chip's power, heat and temperature more complex than expected.

To promote the transient and steady state quality of chip's surface temperature, this paper suggests a new system which combines several approaches such as new signals and their processing, more control algorithms and more complicated strategies. In the new system, CPU usage rate which indicates the chip's dissipated power in advance [9-11], and CPU core temperature which is more accurate and more sensitive and no delay, are used as inputs. And, power feed-forward, core temperature tracking and surface temperature feed-back strategies are added. Numerical simulation proved our new method is effective and advanced.

#### II. CHIPS' COOLING AND TRADITIONAL CONTROL SYSTEMS

Traditional cooling control system (seeing Fig. 1) uses analogue variables as inputs. It applies proportional strategy in Fig. 2 to determine the excitation voltage  $V_{out}$  of the cooling actuator, usually one or two air fan, as follows:

$$V_{out} = k_n T(t) \tag{1}$$

It is obvious that the object itself is not complex. That is, the chip's heat transfer delay and the air fan's cooling time are



Figure 1. CPU chips' cooling model of tracking control system.

common properties in control systems. But there is peculiar complexity resulting from the nonstationary random and anisotropic inconsistency in temperature field. As being seen in Fig. 3, not only is there nonstationary random in out-core temperature and core temperature, but at least 1 second of time delay and at least 1°C of temperature variance in them as well. Besides, the uncertainty of heat resulted from CPU workloads and the variant cooling time influenced by heat exchanging efficiency of actuator, etc., bring the chips' cooling with difficulties.



Figure 2. Traditional proportional tracking strategy.

Generally speaking, the traditional proportional tracking strategy cannot further improve the cooling control quality, and cannot face future challenges.



Figure 3. CPU core temp. & out-core temp.

#### III. MODEL OF CONTROL SYSTEM WITH DISSIPATED POWER FEED-FORWARD AND MULTI-TEMPERATURE PID

# A. Improved Model of Controller

We noticed that chip's power consumption is in advance to its temperature. On the basis of existing tracking system, we introduce usage feed-forward to reflect the action of CPU's power consumption to temperature. Besides, we substitute outcore temperature with core temperature as input variable, and add algorithms of tracking and feed-back. Certainly, we take short window mean value from sequences of core temperature and usage. Fig. 4 is the improved control system model. In Eq. (2) are several algorithms in the new model such as usage sequence nonlinear feed-forward, core temperature tracking, surface temperature feed-back, etc. In that equation,  $k_u$  is usage feed-forward coefficient and the linear item,  $k_r$ , is added to make up the attenuation with usage increasing. The  $k_p$ ,  $k_i$  and  $k_d$  are proportional, integral and deviational coefficients of PID algorithm, respectively. The  $k_f$  is coefficient of different value feed-back.

$$k_{u} \left\{ 1 - \exp\left(-E\left[U_{n}^{L,l}\right]\right) + k_{r}E\left[U_{n}^{L,l}\right]\right\} \\ k_{p} E\left[T_{n}^{L,l}\right] + k_{d} \frac{\left\{E\left[T_{n}^{L,l}\right] - E\left[T_{n-1}^{L,l}\right]\right\}}{l} \\ k_{i} \sum_{i=1}^{n} \left\{T_{o}(n) - \frac{T_{\max} + T_{\min}}{2}\right\} \\ k_{f} \left\{T_{o}(n-1) - E\left[T_{n}^{L,l}\right]\right\}$$
(2)

In Eq. (2),  $E[U_n^{L,l}]$  and  $E[T_n^{L,l}]$  are short window mean value of usage sequence and core temperature sequence, respectively.

# *B.* Nonlinear Relation between Usage Rate and Core Temperature

The CPU usage measures microprocessor's workloads. It reflects the power consumption of chip directly and determines the amount of heat generated in the chip [7, 10-11]. For a CPU chip, suppose it gains heat  $Q_{gained}$  from its environment,  $Q_{transfer}$  is the amount of heat transferred to it during a period of time and  $Q_{component}$  is the amount of heat produced by running programs during a period of time. This can be expressed as [10]:

$$Q_{gained} = Q_{transfer} + Q_{component} = k(T_{out} - T_{source})t + [P_{base} + (P_{max} - P_{base})u]t$$
(3)

where  $T_{source}$  is core temperature and  $T_{out}$  is air temperature,  $P_{base}$  is the power consumption when CPU is idle and  $P_{max}$  is the consumption when chip is fully utilized, u is usage rate.

Since 
$$\Delta T = \frac{1}{mc} \Delta Q$$
, then



Figure 4. Improved model of CPU chip cooling controller.

$$\frac{dQ_{gained}}{dt} + k(T_{source} - T_{out}) = P_{base} + (P_{max} - P_{base})u$$
$$mc \frac{d(T_{source} - T_{out})}{dt} + k(T_{source} - T_{out}) =$$
(4)
$$P_{base} + (P_{max} - P_{base})u$$

where m is the mass of chip, c is its heat capacity, and k is related to m and c.

In Fig. 5 is statistic result of real out-core temperature measuring values. When usage is larger than 10%, the action of dissipated power to out-core temperature can be seen easily. Through fitting, we get the relation between steady temperature values and constant usage values as Eq. (5), where  $T_{base}$  is the basic temperature corresponding to zero usage,  $k_u$  and  $k_r$  are constants. It must be pointed out that Eq. (5) just coincides with the solution form of Eq. (4).

$$T(\infty) = T_{hase} + k_{\mu} \left( 1 - e^{-k_{\mu}u} \right) \tag{5}$$



C. Short Window Mean Value of Core Temperature and Usage Sequences

The discrete sequence of core temperature  $\{T_n\}$  is a nonstationary random one with big sampling cycle. It can be proved that the short window mean value of the sequence  $\{E[T_n^{L,l}]\}$  contains the same heat as  $\{T_n\}$ . And, the waveform of the former is smoother than that of the latter. This means short window mean value is more fittable for being applied to cooling control than discrete values.

# **Brief proof:**

Let frame length *L* is 2, frame width *l* is 1, then  $\{E[T_n^{L,l}]\}$  contains heat

$$Q_{aver} = \sum_{i=1}^{n} \int_{i-1}^{i} T_i dt = \frac{T_1}{2} + \sum_{i=1}^{n} \int_{i}^{i+1} \frac{T_i + T_{i+1}}{2} dt + \frac{T_n}{2}. \text{ And } \{T_n\}$$

contains heat  $Q_{disc} = \frac{T_1}{2} + \sum_{i=1}^{n} \int_{i}^{i+1} \frac{T_i + T_{i+1}}{2} dt$ , if *n* is large

enough,  $Q_{aver} \approx Q_{disc}$ . Furthermore, if  $T_i + 1 > T_i$ , then  $T_{i+1} > \frac{T_i + T_{i+1}}{2}$ , if  $T_i + 1 < T_i$ , then  $T_{i+1} < \frac{T_i + T_{i+1}}{2}$ . **Proof end** 

As for usage sequence, U(n), we make use of its short window mean value  $\left\{ E\left[U_{n}^{L,l}\right]\right\}$ , too.

## IV. SIMULATION AND ANALYSIS

#### A. Selection of Parameters

For simplicity, we suppose the power dissipated by cooling driving actuator is all used for cooling. According to experiment experiences, let cooling time,  $t_d$ , is 1 second, and time constant of heat transfer,  $\tau$ , is 0.25 second. That means all heat generated from source is taken away to environment by cooling actuator in 1 second.

All parameters' values are taken as follows: L=2, l=1,  $\tau=0.25$ s,  $t_d=1$ s,  $(T_{\min}, T_{\max})=(50^{\circ}\text{C}, 70^{\circ}\text{C})$ ,  $(U_{\min}, U_{\max})=(8\text{V}, 12\text{V})$ ,  $k_p=0.1$ ,  $k_i=0.01$ ,  $k_d=0.01$ ,  $k_f=0.01$ ,  $k_c=1.2$ ,  $k_h=8$ ,  $k_u=0.1$ ,  $k_r=1.0$ .

# B. Simulation Results and Analysis

1) Comparisons between core temperature and out-core temperature as input variable. The simulation results of pure proportional tracking control (seeing Fig. 1) using out-core temeprature or core temeprature as inputs are showed in Fig. 6, respectively. Obviously, owing to the uncertainty and delay of out-core temeprature, its result is much worse than that of core temeprature. Simulation results indicate that the larger or smaller is  $k_p$ , the larger is the surface temperature's fluctuation amplitude. This is because the pure proportional tracking strategy can not eliminate the disturbance of uncertain deviation in out-core temperature values. In real application, the effect of  $k_p$  is the same as that of heat exchange coefficient of cooling actuator, this means that traditional strategy does not fit for cooling actuator with lower heat exchange efficiency, and not for actuator with higher heat exchange efficiency, either. It is needed to point out that out-core temperature used in our numerical calculating has smaller random deviation. Actual random deviation is bigger and its influence is more serious.



Figure 6. Control results by traditional system.

2) Comparisons of control qualities made by different improved strategies. For improved control system in Fig. 4, we use core temperature in Fig. 2 as input variable to do simulations. At the beginning, the controller only with PD core temperature tracking algorithm make surface temperature having 1°C steady state deviation. Then, after surface temperature feed-back being added, controller eliminates the deviation. Or, after usage nonlinear feed-forward being added, the overshoot of surface temperature is lower. Lastly, once surface temperature feed-back and usage feed-forward are added simultaniously, the result is best both in transient and steady state.(seeing Fig. 7)



3) Comparison of power dissipated by cooling actuator. For the combination strategy of core temperature tracking and surface temperature feed-back, the numerical calculating results of using two different values spent different energy. Control system using discrete values of core temperature makes the actuator spend 2.46053 Joule of energy, whereas system using short window mean values of core temperature makes the actuator spend 1.76581 Joule of energy. In our example, the former spends 0.69472 Joule more energy than the latter during 20 seconds.



Fig. 8. A feasible realizing scheme of improved chip's cooling control system.

4) Discussions about feasibility and practicability. Simulating work indicates that the parameters in new system model such as  $k_p$ ,  $k_i$ ,  $k_d$ ,  $k_f$ ,  $k_u$ ,  $k_r$ , L and l can be adjusted in wide range to cope with different values of  $\tau$ ,  $t_d$ ,  $k_c$ ,  $k_h$ . For instance, whether the cooling actuator is air fan or liquid pump or semeconductor thermoelectric patch, i.e., whether the actuators' heat exchange efficiency is high or low, simulating shows our model can give excellent control quality. For example, under the circumstance of CPU core temperature behaving different random properties owing to different operating systems, or different main pulses, or different user softwares, new system can adapt them by using corresponding L and l.

# V. CONCLUSION

This paper studied how to improve CPU chips' cooling control system and suggested several new methods in two aspects: one is of input signals and their processing; the other is of control strategies. Numerical calculating and simulating verified that usage sequence which indicates CPU power and core temperature sequence which is more accurate and more instant can avoid the disadvantages of out-core temperature values. Also, it was verified that power feed-forward and surface temperature feed-back can enhance system's functions and improve its performance. The improved control system has advantages in control quality, energy saving and adaptability. Our improved system can be easily realized by making use of existing techniques. Actually, it does be our coming work. Fig. 8 shows one feasible scheme of our improved control system.

## REFERENCES

- Jong M. Park, Allan T. Evans, K. Rasmussen, et al. A Microvalve with Integrated Sensors and Customizable Normal State for Low-temperature Operation. Journal of Microelectromechanical Systems, Vol. 18, No.4, August 2009. 868-877
- [2] R. Jayaseelan, T. Mitra. Temperature Aware Scheduling for Embedded Processors. Journal of Low Power Electronics, American Scientific Publisher, 5(3), Oct. 2009
- [3] Matt Smith. Measuring temperatures on computer chips with speed and accuracy, Analog
- [4] M. Moonat. Using the On-Chip Thermal Diode on Analog Devices Processors. ANALOG DEVICES: Technical notes on using Analog Devices DSPs, processors and development tools Rev 2, Aug. 2, 2010
- [5] David Hanrahan. Fan-Speed Control Techniques in PCs. Analog Dialogue 34-4 (2000)
- [6] Ed. Grochowski, Murali Annavaram. Energy per Instruction Trends in Intel Microprocessors. Microarchitecture Reserch Lab, Intel Corporation
- [7] E. Rotem, J. Hermerding, C. Aviad, et al. Temperature Measurement in the Intel Core Duo Processor. Intel Document Published August 2008, 45nm Desktop Dual Core Processors Intel Core 2 Duo processor E8000 and E7000 series
- [8] G. Paci, F. Poletti, L. Benini, et al. Exploring Temperature-aware Design in Low-power MPSoCs.
- [9] W. Wu, L. Jin, J. Yang, et al. Tan. Efficient Power Modeling and Software Thermal Sensing for Run time Temperature Monitoring. University of California at River side
- [10] Taliver Heath, Ana Paula Centeno, Pradeep George, et al. Mercury and Freon: Temperature Emulation and Management for Server Systems [J]. ASPLOS'06 October 21-25 2006, San Jose, California, USA.
- [11] Kevin Skadron, Mircea R. Stan, Wei Huang, et al. Temperature-Aware Microarchitecture: Extended Discussion and Results. REPORT CS-2003-08 APRIL 2003.