![]() |
LAPACK
3.4.1
LAPACK: Linear Algebra PACKage
|
00001 *> \brief \b SLAQR5 00002 * 00003 * =========== DOCUMENTATION =========== 00004 * 00005 * Online html documentation available at 00006 * http://www.netlib.org/lapack/explore-html/ 00007 * 00008 *> \htmlonly 00009 *> Download SLAQR5 + dependencies 00010 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/slaqr5.f"> 00011 *> [TGZ]</a> 00012 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/slaqr5.f"> 00013 *> [ZIP]</a> 00014 *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/slaqr5.f"> 00015 *> [TXT]</a> 00016 *> \endhtmlonly 00017 * 00018 * Definition: 00019 * =========== 00020 * 00021 * SUBROUTINE SLAQR5( WANTT, WANTZ, KACC22, N, KTOP, KBOT, NSHFTS, 00022 * SR, SI, H, LDH, ILOZ, IHIZ, Z, LDZ, V, LDV, U, 00023 * LDU, NV, WV, LDWV, NH, WH, LDWH ) 00024 * 00025 * .. Scalar Arguments .. 00026 * INTEGER IHIZ, ILOZ, KACC22, KBOT, KTOP, LDH, LDU, LDV, 00027 * $ LDWH, LDWV, LDZ, N, NH, NSHFTS, NV 00028 * LOGICAL WANTT, WANTZ 00029 * .. 00030 * .. Array Arguments .. 00031 * REAL H( LDH, * ), SI( * ), SR( * ), U( LDU, * ), 00032 * $ V( LDV, * ), WH( LDWH, * ), WV( LDWV, * ), 00033 * $ Z( LDZ, * ) 00034 * .. 00035 * 00036 * 00037 *> \par Purpose: 00038 * ============= 00039 *> 00040 *> \verbatim 00041 *> 00042 *> SLAQR5, called by SLAQR0, performs a 00043 *> single small-bulge multi-shift QR sweep. 00044 *> \endverbatim 00045 * 00046 * Arguments: 00047 * ========== 00048 * 00049 *> \param[in] WANTT 00050 *> \verbatim 00051 *> WANTT is logical scalar 00052 *> WANTT = .true. if the quasi-triangular Schur factor 00053 *> is being computed. WANTT is set to .false. otherwise. 00054 *> \endverbatim 00055 *> 00056 *> \param[in] WANTZ 00057 *> \verbatim 00058 *> WANTZ is logical scalar 00059 *> WANTZ = .true. if the orthogonal Schur factor is being 00060 *> computed. WANTZ is set to .false. otherwise. 00061 *> \endverbatim 00062 *> 00063 *> \param[in] KACC22 00064 *> \verbatim 00065 *> KACC22 is integer with value 0, 1, or 2. 00066 *> Specifies the computation mode of far-from-diagonal 00067 *> orthogonal updates. 00068 *> = 0: SLAQR5 does not accumulate reflections and does not 00069 *> use matrix-matrix multiply to update far-from-diagonal 00070 *> matrix entries. 00071 *> = 1: SLAQR5 accumulates reflections and uses matrix-matrix 00072 *> multiply to update the far-from-diagonal matrix entries. 00073 *> = 2: SLAQR5 accumulates reflections, uses matrix-matrix 00074 *> multiply to update the far-from-diagonal matrix entries, 00075 *> and takes advantage of 2-by-2 block structure during 00076 *> matrix multiplies. 00077 *> \endverbatim 00078 *> 00079 *> \param[in] N 00080 *> \verbatim 00081 *> N is integer scalar 00082 *> N is the order of the Hessenberg matrix H upon which this 00083 *> subroutine operates. 00084 *> \endverbatim 00085 *> 00086 *> \param[in] KTOP 00087 *> \verbatim 00088 *> KTOP is integer scalar 00089 *> \endverbatim 00090 *> 00091 *> \param[in] KBOT 00092 *> \verbatim 00093 *> KBOT is integer scalar 00094 *> These are the first and last rows and columns of an 00095 *> isolated diagonal block upon which the QR sweep is to be 00096 *> applied. It is assumed without a check that 00097 *> either KTOP = 1 or H(KTOP,KTOP-1) = 0 00098 *> and 00099 *> either KBOT = N or H(KBOT+1,KBOT) = 0. 00100 *> \endverbatim 00101 *> 00102 *> \param[in] NSHFTS 00103 *> \verbatim 00104 *> NSHFTS is integer scalar 00105 *> NSHFTS gives the number of simultaneous shifts. NSHFTS 00106 *> must be positive and even. 00107 *> \endverbatim 00108 *> 00109 *> \param[in,out] SR 00110 *> \verbatim 00111 *> SR is REAL array of size (NSHFTS) 00112 *> \endverbatim 00113 *> 00114 *> \param[in,out] SI 00115 *> \verbatim 00116 *> SI is REAL array of size (NSHFTS) 00117 *> SR contains the real parts and SI contains the imaginary 00118 *> parts of the NSHFTS shifts of origin that define the 00119 *> multi-shift QR sweep. On output SR and SI may be 00120 *> reordered. 00121 *> \endverbatim 00122 *> 00123 *> \param[in,out] H 00124 *> \verbatim 00125 *> H is REAL array of size (LDH,N) 00126 *> On input H contains a Hessenberg matrix. On output a 00127 *> multi-shift QR sweep with shifts SR(J)+i*SI(J) is applied 00128 *> to the isolated diagonal block in rows and columns KTOP 00129 *> through KBOT. 00130 *> \endverbatim 00131 *> 00132 *> \param[in] LDH 00133 *> \verbatim 00134 *> LDH is integer scalar 00135 *> LDH is the leading dimension of H just as declared in the 00136 *> calling procedure. LDH.GE.MAX(1,N). 00137 *> \endverbatim 00138 *> 00139 *> \param[in] ILOZ 00140 *> \verbatim 00141 *> ILOZ is INTEGER 00142 *> \endverbatim 00143 *> 00144 *> \param[in] IHIZ 00145 *> \verbatim 00146 *> IHIZ is INTEGER 00147 *> Specify the rows of Z to which transformations must be 00148 *> applied if WANTZ is .TRUE.. 1 .LE. ILOZ .LE. IHIZ .LE. N 00149 *> \endverbatim 00150 *> 00151 *> \param[in,out] Z 00152 *> \verbatim 00153 *> Z is REAL array of size (LDZ,IHI) 00154 *> If WANTZ = .TRUE., then the QR Sweep orthogonal 00155 *> similarity transformation is accumulated into 00156 *> Z(ILOZ:IHIZ,ILO:IHI) from the right. 00157 *> If WANTZ = .FALSE., then Z is unreferenced. 00158 *> \endverbatim 00159 *> 00160 *> \param[in] LDZ 00161 *> \verbatim 00162 *> LDZ is integer scalar 00163 *> LDA is the leading dimension of Z just as declared in 00164 *> the calling procedure. LDZ.GE.N. 00165 *> \endverbatim 00166 *> 00167 *> \param[out] V 00168 *> \verbatim 00169 *> V is REAL array of size (LDV,NSHFTS/2) 00170 *> \endverbatim 00171 *> 00172 *> \param[in] LDV 00173 *> \verbatim 00174 *> LDV is integer scalar 00175 *> LDV is the leading dimension of V as declared in the 00176 *> calling procedure. LDV.GE.3. 00177 *> \endverbatim 00178 *> 00179 *> \param[out] U 00180 *> \verbatim 00181 *> U is REAL array of size 00182 *> (LDU,3*NSHFTS-3) 00183 *> \endverbatim 00184 *> 00185 *> \param[in] LDU 00186 *> \verbatim 00187 *> LDU is integer scalar 00188 *> LDU is the leading dimension of U just as declared in the 00189 *> in the calling subroutine. LDU.GE.3*NSHFTS-3. 00190 *> \endverbatim 00191 *> 00192 *> \param[in] NH 00193 *> \verbatim 00194 *> NH is integer scalar 00195 *> NH is the number of columns in array WH available for 00196 *> workspace. NH.GE.1. 00197 *> \endverbatim 00198 *> 00199 *> \param[out] WH 00200 *> \verbatim 00201 *> WH is REAL array of size (LDWH,NH) 00202 *> \endverbatim 00203 *> 00204 *> \param[in] LDWH 00205 *> \verbatim 00206 *> LDWH is integer scalar 00207 *> Leading dimension of WH just as declared in the 00208 *> calling procedure. LDWH.GE.3*NSHFTS-3. 00209 *> \endverbatim 00210 *> 00211 *> \param[in] NV 00212 *> \verbatim 00213 *> NV is integer scalar 00214 *> NV is the number of rows in WV agailable for workspace. 00215 *> NV.GE.1. 00216 *> \endverbatim 00217 *> 00218 *> \param[out] WV 00219 *> \verbatim 00220 *> WV is REAL array of size 00221 *> (LDWV,3*NSHFTS-3) 00222 *> \endverbatim 00223 *> 00224 *> \param[in] LDWV 00225 *> \verbatim 00226 *> LDWV is integer scalar 00227 *> LDWV is the leading dimension of WV as declared in the 00228 *> in the calling subroutine. LDWV.GE.NV. 00229 *> \endverbatim 00230 * 00231 * Authors: 00232 * ======== 00233 * 00234 *> \author Univ. of Tennessee 00235 *> \author Univ. of California Berkeley 00236 *> \author Univ. of Colorado Denver 00237 *> \author NAG Ltd. 00238 * 00239 *> \date November 2011 00240 * 00241 *> \ingroup realOTHERauxiliary 00242 * 00243 *> \par Contributors: 00244 * ================== 00245 *> 00246 *> Karen Braman and Ralph Byers, Department of Mathematics, 00247 *> University of Kansas, USA 00248 * 00249 *> \par References: 00250 * ================ 00251 *> 00252 *> K. Braman, R. Byers and R. Mathias, The Multi-Shift QR 00253 *> Algorithm Part I: Maintaining Well Focused Shifts, and Level 3 00254 *> Performance, SIAM Journal of Matrix Analysis, volume 23, pages 00255 *> 929--947, 2002. 00256 *> 00257 * ===================================================================== 00258 SUBROUTINE SLAQR5( WANTT, WANTZ, KACC22, N, KTOP, KBOT, NSHFTS, 00259 $ SR, SI, H, LDH, ILOZ, IHIZ, Z, LDZ, V, LDV, U, 00260 $ LDU, NV, WV, LDWV, NH, WH, LDWH ) 00261 * 00262 * -- LAPACK auxiliary routine (version 3.4.0) -- 00263 * -- LAPACK is a software package provided by Univ. of Tennessee, -- 00264 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 00265 * November 2011 00266 * 00267 * .. Scalar Arguments .. 00268 INTEGER IHIZ, ILOZ, KACC22, KBOT, KTOP, LDH, LDU, LDV, 00269 $ LDWH, LDWV, LDZ, N, NH, NSHFTS, NV 00270 LOGICAL WANTT, WANTZ 00271 * .. 00272 * .. Array Arguments .. 00273 REAL H( LDH, * ), SI( * ), SR( * ), U( LDU, * ), 00274 $ V( LDV, * ), WH( LDWH, * ), WV( LDWV, * ), 00275 $ Z( LDZ, * ) 00276 * .. 00277 * 00278 * ================================================================ 00279 * .. Parameters .. 00280 REAL ZERO, ONE 00281 PARAMETER ( ZERO = 0.0e0, ONE = 1.0e0 ) 00282 * .. 00283 * .. Local Scalars .. 00284 REAL ALPHA, BETA, H11, H12, H21, H22, REFSUM, 00285 $ SAFMAX, SAFMIN, SCL, SMLNUM, SWAP, TST1, TST2, 00286 $ ULP 00287 INTEGER I, I2, I4, INCOL, J, J2, J4, JBOT, JCOL, JLEN, 00288 $ JROW, JTOP, K, K1, KDU, KMS, KNZ, KRCOL, KZS, 00289 $ M, M22, MBOT, MEND, MSTART, MTOP, NBMPS, NDCOL, 00290 $ NS, NU 00291 LOGICAL ACCUM, BLK22, BMP22 00292 * .. 00293 * .. External Functions .. 00294 REAL SLAMCH 00295 EXTERNAL SLAMCH 00296 * .. 00297 * .. Intrinsic Functions .. 00298 * 00299 INTRINSIC ABS, MAX, MIN, MOD, REAL 00300 * .. 00301 * .. Local Arrays .. 00302 REAL VT( 3 ) 00303 * .. 00304 * .. External Subroutines .. 00305 EXTERNAL SGEMM, SLABAD, SLACPY, SLAQR1, SLARFG, SLASET, 00306 $ STRMM 00307 * .. 00308 * .. Executable Statements .. 00309 * 00310 * ==== If there are no shifts, then there is nothing to do. ==== 00311 * 00312 IF( NSHFTS.LT.2 ) 00313 $ RETURN 00314 * 00315 * ==== If the active block is empty or 1-by-1, then there 00316 * . is nothing to do. ==== 00317 * 00318 IF( KTOP.GE.KBOT ) 00319 $ RETURN 00320 * 00321 * ==== Shuffle shifts into pairs of real shifts and pairs 00322 * . of complex conjugate shifts assuming complex 00323 * . conjugate shifts are already adjacent to one 00324 * . another. ==== 00325 * 00326 DO 10 I = 1, NSHFTS - 2, 2 00327 IF( SI( I ).NE.-SI( I+1 ) ) THEN 00328 * 00329 SWAP = SR( I ) 00330 SR( I ) = SR( I+1 ) 00331 SR( I+1 ) = SR( I+2 ) 00332 SR( I+2 ) = SWAP 00333 * 00334 SWAP = SI( I ) 00335 SI( I ) = SI( I+1 ) 00336 SI( I+1 ) = SI( I+2 ) 00337 SI( I+2 ) = SWAP 00338 END IF 00339 10 CONTINUE 00340 * 00341 * ==== NSHFTS is supposed to be even, but if it is odd, 00342 * . then simply reduce it by one. The shuffle above 00343 * . ensures that the dropped shift is real and that 00344 * . the remaining shifts are paired. ==== 00345 * 00346 NS = NSHFTS - MOD( NSHFTS, 2 ) 00347 * 00348 * ==== Machine constants for deflation ==== 00349 * 00350 SAFMIN = SLAMCH( 'SAFE MINIMUM' ) 00351 SAFMAX = ONE / SAFMIN 00352 CALL SLABAD( SAFMIN, SAFMAX ) 00353 ULP = SLAMCH( 'PRECISION' ) 00354 SMLNUM = SAFMIN*( REAL( N ) / ULP ) 00355 * 00356 * ==== Use accumulated reflections to update far-from-diagonal 00357 * . entries ? ==== 00358 * 00359 ACCUM = ( KACC22.EQ.1 ) .OR. ( KACC22.EQ.2 ) 00360 * 00361 * ==== If so, exploit the 2-by-2 block structure? ==== 00362 * 00363 BLK22 = ( NS.GT.2 ) .AND. ( KACC22.EQ.2 ) 00364 * 00365 * ==== clear trash ==== 00366 * 00367 IF( KTOP+2.LE.KBOT ) 00368 $ H( KTOP+2, KTOP ) = ZERO 00369 * 00370 * ==== NBMPS = number of 2-shift bulges in the chain ==== 00371 * 00372 NBMPS = NS / 2 00373 * 00374 * ==== KDU = width of slab ==== 00375 * 00376 KDU = 6*NBMPS - 3 00377 * 00378 * ==== Create and chase chains of NBMPS bulges ==== 00379 * 00380 DO 220 INCOL = 3*( 1-NBMPS ) + KTOP - 1, KBOT - 2, 3*NBMPS - 2 00381 NDCOL = INCOL + KDU 00382 IF( ACCUM ) 00383 $ CALL SLASET( 'ALL', KDU, KDU, ZERO, ONE, U, LDU ) 00384 * 00385 * ==== Near-the-diagonal bulge chase. The following loop 00386 * . performs the near-the-diagonal part of a small bulge 00387 * . multi-shift QR sweep. Each 6*NBMPS-2 column diagonal 00388 * . chunk extends from column INCOL to column NDCOL 00389 * . (including both column INCOL and column NDCOL). The 00390 * . following loop chases a 3*NBMPS column long chain of 00391 * . NBMPS bulges 3*NBMPS-2 columns to the right. (INCOL 00392 * . may be less than KTOP and and NDCOL may be greater than 00393 * . KBOT indicating phantom columns from which to chase 00394 * . bulges before they are actually introduced or to which 00395 * . to chase bulges beyond column KBOT.) ==== 00396 * 00397 DO 150 KRCOL = INCOL, MIN( INCOL+3*NBMPS-3, KBOT-2 ) 00398 * 00399 * ==== Bulges number MTOP to MBOT are active double implicit 00400 * . shift bulges. There may or may not also be small 00401 * . 2-by-2 bulge, if there is room. The inactive bulges 00402 * . (if any) must wait until the active bulges have moved 00403 * . down the diagonal to make room. The phantom matrix 00404 * . paradigm described above helps keep track. ==== 00405 * 00406 MTOP = MAX( 1, ( ( KTOP-1 )-KRCOL+2 ) / 3+1 ) 00407 MBOT = MIN( NBMPS, ( KBOT-KRCOL ) / 3 ) 00408 M22 = MBOT + 1 00409 BMP22 = ( MBOT.LT.NBMPS ) .AND. ( KRCOL+3*( M22-1 ) ).EQ. 00410 $ ( KBOT-2 ) 00411 * 00412 * ==== Generate reflections to chase the chain right 00413 * . one column. (The minimum value of K is KTOP-1.) ==== 00414 * 00415 DO 20 M = MTOP, MBOT 00416 K = KRCOL + 3*( M-1 ) 00417 IF( K.EQ.KTOP-1 ) THEN 00418 CALL SLAQR1( 3, H( KTOP, KTOP ), LDH, SR( 2*M-1 ), 00419 $ SI( 2*M-1 ), SR( 2*M ), SI( 2*M ), 00420 $ V( 1, M ) ) 00421 ALPHA = V( 1, M ) 00422 CALL SLARFG( 3, ALPHA, V( 2, M ), 1, V( 1, M ) ) 00423 ELSE 00424 BETA = H( K+1, K ) 00425 V( 2, M ) = H( K+2, K ) 00426 V( 3, M ) = H( K+3, K ) 00427 CALL SLARFG( 3, BETA, V( 2, M ), 1, V( 1, M ) ) 00428 * 00429 * ==== A Bulge may collapse because of vigilant 00430 * . deflation or destructive underflow. In the 00431 * . underflow case, try the two-small-subdiagonals 00432 * . trick to try to reinflate the bulge. ==== 00433 * 00434 IF( H( K+3, K ).NE.ZERO .OR. H( K+3, K+1 ).NE. 00435 $ ZERO .OR. H( K+3, K+2 ).EQ.ZERO ) THEN 00436 * 00437 * ==== Typical case: not collapsed (yet). ==== 00438 * 00439 H( K+1, K ) = BETA 00440 H( K+2, K ) = ZERO 00441 H( K+3, K ) = ZERO 00442 ELSE 00443 * 00444 * ==== Atypical case: collapsed. Attempt to 00445 * . reintroduce ignoring H(K+1,K) and H(K+2,K). 00446 * . If the fill resulting from the new 00447 * . reflector is too large, then abandon it. 00448 * . Otherwise, use the new one. ==== 00449 * 00450 CALL SLAQR1( 3, H( K+1, K+1 ), LDH, SR( 2*M-1 ), 00451 $ SI( 2*M-1 ), SR( 2*M ), SI( 2*M ), 00452 $ VT ) 00453 ALPHA = VT( 1 ) 00454 CALL SLARFG( 3, ALPHA, VT( 2 ), 1, VT( 1 ) ) 00455 REFSUM = VT( 1 )*( H( K+1, K )+VT( 2 )* 00456 $ H( K+2, K ) ) 00457 * 00458 IF( ABS( H( K+2, K )-REFSUM*VT( 2 ) )+ 00459 $ ABS( REFSUM*VT( 3 ) ).GT.ULP* 00460 $ ( ABS( H( K, K ) )+ABS( H( K+1, 00461 $ K+1 ) )+ABS( H( K+2, K+2 ) ) ) ) THEN 00462 * 00463 * ==== Starting a new bulge here would 00464 * . create non-negligible fill. Use 00465 * . the old one with trepidation. ==== 00466 * 00467 H( K+1, K ) = BETA 00468 H( K+2, K ) = ZERO 00469 H( K+3, K ) = ZERO 00470 ELSE 00471 * 00472 * ==== Stating a new bulge here would 00473 * . create only negligible fill. 00474 * . Replace the old reflector with 00475 * . the new one. ==== 00476 * 00477 H( K+1, K ) = H( K+1, K ) - REFSUM 00478 H( K+2, K ) = ZERO 00479 H( K+3, K ) = ZERO 00480 V( 1, M ) = VT( 1 ) 00481 V( 2, M ) = VT( 2 ) 00482 V( 3, M ) = VT( 3 ) 00483 END IF 00484 END IF 00485 END IF 00486 20 CONTINUE 00487 * 00488 * ==== Generate a 2-by-2 reflection, if needed. ==== 00489 * 00490 K = KRCOL + 3*( M22-1 ) 00491 IF( BMP22 ) THEN 00492 IF( K.EQ.KTOP-1 ) THEN 00493 CALL SLAQR1( 2, H( K+1, K+1 ), LDH, SR( 2*M22-1 ), 00494 $ SI( 2*M22-1 ), SR( 2*M22 ), SI( 2*M22 ), 00495 $ V( 1, M22 ) ) 00496 BETA = V( 1, M22 ) 00497 CALL SLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) ) 00498 ELSE 00499 BETA = H( K+1, K ) 00500 V( 2, M22 ) = H( K+2, K ) 00501 CALL SLARFG( 2, BETA, V( 2, M22 ), 1, V( 1, M22 ) ) 00502 H( K+1, K ) = BETA 00503 H( K+2, K ) = ZERO 00504 END IF 00505 END IF 00506 * 00507 * ==== Multiply H by reflections from the left ==== 00508 * 00509 IF( ACCUM ) THEN 00510 JBOT = MIN( NDCOL, KBOT ) 00511 ELSE IF( WANTT ) THEN 00512 JBOT = N 00513 ELSE 00514 JBOT = KBOT 00515 END IF 00516 DO 40 J = MAX( KTOP, KRCOL ), JBOT 00517 MEND = MIN( MBOT, ( J-KRCOL+2 ) / 3 ) 00518 DO 30 M = MTOP, MEND 00519 K = KRCOL + 3*( M-1 ) 00520 REFSUM = V( 1, M )*( H( K+1, J )+V( 2, M )* 00521 $ H( K+2, J )+V( 3, M )*H( K+3, J ) ) 00522 H( K+1, J ) = H( K+1, J ) - REFSUM 00523 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M ) 00524 H( K+3, J ) = H( K+3, J ) - REFSUM*V( 3, M ) 00525 30 CONTINUE 00526 40 CONTINUE 00527 IF( BMP22 ) THEN 00528 K = KRCOL + 3*( M22-1 ) 00529 DO 50 J = MAX( K+1, KTOP ), JBOT 00530 REFSUM = V( 1, M22 )*( H( K+1, J )+V( 2, M22 )* 00531 $ H( K+2, J ) ) 00532 H( K+1, J ) = H( K+1, J ) - REFSUM 00533 H( K+2, J ) = H( K+2, J ) - REFSUM*V( 2, M22 ) 00534 50 CONTINUE 00535 END IF 00536 * 00537 * ==== Multiply H by reflections from the right. 00538 * . Delay filling in the last row until the 00539 * . vigilant deflation check is complete. ==== 00540 * 00541 IF( ACCUM ) THEN 00542 JTOP = MAX( KTOP, INCOL ) 00543 ELSE IF( WANTT ) THEN 00544 JTOP = 1 00545 ELSE 00546 JTOP = KTOP 00547 END IF 00548 DO 90 M = MTOP, MBOT 00549 IF( V( 1, M ).NE.ZERO ) THEN 00550 K = KRCOL + 3*( M-1 ) 00551 DO 60 J = JTOP, MIN( KBOT, K+3 ) 00552 REFSUM = V( 1, M )*( H( J, K+1 )+V( 2, M )* 00553 $ H( J, K+2 )+V( 3, M )*H( J, K+3 ) ) 00554 H( J, K+1 ) = H( J, K+1 ) - REFSUM 00555 H( J, K+2 ) = H( J, K+2 ) - REFSUM*V( 2, M ) 00556 H( J, K+3 ) = H( J, K+3 ) - REFSUM*V( 3, M ) 00557 60 CONTINUE 00558 * 00559 IF( ACCUM ) THEN 00560 * 00561 * ==== Accumulate U. (If necessary, update Z later 00562 * . with with an efficient matrix-matrix 00563 * . multiply.) ==== 00564 * 00565 KMS = K - INCOL 00566 DO 70 J = MAX( 1, KTOP-INCOL ), KDU 00567 REFSUM = V( 1, M )*( U( J, KMS+1 )+V( 2, M )* 00568 $ U( J, KMS+2 )+V( 3, M )*U( J, KMS+3 ) ) 00569 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM 00570 U( J, KMS+2 ) = U( J, KMS+2 ) - REFSUM*V( 2, M ) 00571 U( J, KMS+3 ) = U( J, KMS+3 ) - REFSUM*V( 3, M ) 00572 70 CONTINUE 00573 ELSE IF( WANTZ ) THEN 00574 * 00575 * ==== U is not accumulated, so update Z 00576 * . now by multiplying by reflections 00577 * . from the right. ==== 00578 * 00579 DO 80 J = ILOZ, IHIZ 00580 REFSUM = V( 1, M )*( Z( J, K+1 )+V( 2, M )* 00581 $ Z( J, K+2 )+V( 3, M )*Z( J, K+3 ) ) 00582 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM 00583 Z( J, K+2 ) = Z( J, K+2 ) - REFSUM*V( 2, M ) 00584 Z( J, K+3 ) = Z( J, K+3 ) - REFSUM*V( 3, M ) 00585 80 CONTINUE 00586 END IF 00587 END IF 00588 90 CONTINUE 00589 * 00590 * ==== Special case: 2-by-2 reflection (if needed) ==== 00591 * 00592 K = KRCOL + 3*( M22-1 ) 00593 IF( BMP22 ) THEN 00594 IF ( V( 1, M22 ).NE.ZERO ) THEN 00595 DO 100 J = JTOP, MIN( KBOT, K+3 ) 00596 REFSUM = V( 1, M22 )*( H( J, K+1 )+V( 2, M22 )* 00597 $ H( J, K+2 ) ) 00598 H( J, K+1 ) = H( J, K+1 ) - REFSUM 00599 H( J, K+2 ) = H( J, K+2 ) - REFSUM*V( 2, M22 ) 00600 100 CONTINUE 00601 * 00602 IF( ACCUM ) THEN 00603 KMS = K - INCOL 00604 DO 110 J = MAX( 1, KTOP-INCOL ), KDU 00605 REFSUM = V( 1, M22 )*( U( J, KMS+1 )+ 00606 $ V( 2, M22 )*U( J, KMS+2 ) ) 00607 U( J, KMS+1 ) = U( J, KMS+1 ) - REFSUM 00608 U( J, KMS+2 ) = U( J, KMS+2 ) - REFSUM* 00609 $ V( 2, M22 ) 00610 110 CONTINUE 00611 ELSE IF( WANTZ ) THEN 00612 DO 120 J = ILOZ, IHIZ 00613 REFSUM = V( 1, M22 )*( Z( J, K+1 )+V( 2, M22 )* 00614 $ Z( J, K+2 ) ) 00615 Z( J, K+1 ) = Z( J, K+1 ) - REFSUM 00616 Z( J, K+2 ) = Z( J, K+2 ) - REFSUM*V( 2, M22 ) 00617 120 CONTINUE 00618 END IF 00619 END IF 00620 END IF 00621 * 00622 * ==== Vigilant deflation check ==== 00623 * 00624 MSTART = MTOP 00625 IF( KRCOL+3*( MSTART-1 ).LT.KTOP ) 00626 $ MSTART = MSTART + 1 00627 MEND = MBOT 00628 IF( BMP22 ) 00629 $ MEND = MEND + 1 00630 IF( KRCOL.EQ.KBOT-2 ) 00631 $ MEND = MEND + 1 00632 DO 130 M = MSTART, MEND 00633 K = MIN( KBOT-1, KRCOL+3*( M-1 ) ) 00634 * 00635 * ==== The following convergence test requires that 00636 * . the tradition small-compared-to-nearby-diagonals 00637 * . criterion and the Ahues & Tisseur (LAWN 122, 1997) 00638 * . criteria both be satisfied. The latter improves 00639 * . accuracy in some examples. Falling back on an 00640 * . alternate convergence criterion when TST1 or TST2 00641 * . is zero (as done here) is traditional but probably 00642 * . unnecessary. ==== 00643 * 00644 IF( H( K+1, K ).NE.ZERO ) THEN 00645 TST1 = ABS( H( K, K ) ) + ABS( H( K+1, K+1 ) ) 00646 IF( TST1.EQ.ZERO ) THEN 00647 IF( K.GE.KTOP+1 ) 00648 $ TST1 = TST1 + ABS( H( K, K-1 ) ) 00649 IF( K.GE.KTOP+2 ) 00650 $ TST1 = TST1 + ABS( H( K, K-2 ) ) 00651 IF( K.GE.KTOP+3 ) 00652 $ TST1 = TST1 + ABS( H( K, K-3 ) ) 00653 IF( K.LE.KBOT-2 ) 00654 $ TST1 = TST1 + ABS( H( K+2, K+1 ) ) 00655 IF( K.LE.KBOT-3 ) 00656 $ TST1 = TST1 + ABS( H( K+3, K+1 ) ) 00657 IF( K.LE.KBOT-4 ) 00658 $ TST1 = TST1 + ABS( H( K+4, K+1 ) ) 00659 END IF 00660 IF( ABS( H( K+1, K ) ).LE.MAX( SMLNUM, ULP*TST1 ) ) 00661 $ THEN 00662 H12 = MAX( ABS( H( K+1, K ) ), ABS( H( K, K+1 ) ) ) 00663 H21 = MIN( ABS( H( K+1, K ) ), ABS( H( K, K+1 ) ) ) 00664 H11 = MAX( ABS( H( K+1, K+1 ) ), 00665 $ ABS( H( K, K )-H( K+1, K+1 ) ) ) 00666 H22 = MIN( ABS( H( K+1, K+1 ) ), 00667 $ ABS( H( K, K )-H( K+1, K+1 ) ) ) 00668 SCL = H11 + H12 00669 TST2 = H22*( H11 / SCL ) 00670 * 00671 IF( TST2.EQ.ZERO .OR. H21*( H12 / SCL ).LE. 00672 $ MAX( SMLNUM, ULP*TST2 ) )H( K+1, K ) = ZERO 00673 END IF 00674 END IF 00675 130 CONTINUE 00676 * 00677 * ==== Fill in the last row of each bulge. ==== 00678 * 00679 MEND = MIN( NBMPS, ( KBOT-KRCOL-1 ) / 3 ) 00680 DO 140 M = MTOP, MEND 00681 K = KRCOL + 3*( M-1 ) 00682 REFSUM = V( 1, M )*V( 3, M )*H( K+4, K+3 ) 00683 H( K+4, K+1 ) = -REFSUM 00684 H( K+4, K+2 ) = -REFSUM*V( 2, M ) 00685 H( K+4, K+3 ) = H( K+4, K+3 ) - REFSUM*V( 3, M ) 00686 140 CONTINUE 00687 * 00688 * ==== End of near-the-diagonal bulge chase. ==== 00689 * 00690 150 CONTINUE 00691 * 00692 * ==== Use U (if accumulated) to update far-from-diagonal 00693 * . entries in H. If required, use U to update Z as 00694 * . well. ==== 00695 * 00696 IF( ACCUM ) THEN 00697 IF( WANTT ) THEN 00698 JTOP = 1 00699 JBOT = N 00700 ELSE 00701 JTOP = KTOP 00702 JBOT = KBOT 00703 END IF 00704 IF( ( .NOT.BLK22 ) .OR. ( INCOL.LT.KTOP ) .OR. 00705 $ ( NDCOL.GT.KBOT ) .OR. ( NS.LE.2 ) ) THEN 00706 * 00707 * ==== Updates not exploiting the 2-by-2 block 00708 * . structure of U. K1 and NU keep track of 00709 * . the location and size of U in the special 00710 * . cases of introducing bulges and chasing 00711 * . bulges off the bottom. In these special 00712 * . cases and in case the number of shifts 00713 * . is NS = 2, there is no 2-by-2 block 00714 * . structure to exploit. ==== 00715 * 00716 K1 = MAX( 1, KTOP-INCOL ) 00717 NU = ( KDU-MAX( 0, NDCOL-KBOT ) ) - K1 + 1 00718 * 00719 * ==== Horizontal Multiply ==== 00720 * 00721 DO 160 JCOL = MIN( NDCOL, KBOT ) + 1, JBOT, NH 00722 JLEN = MIN( NH, JBOT-JCOL+1 ) 00723 CALL SGEMM( 'C', 'N', NU, JLEN, NU, ONE, U( K1, K1 ), 00724 $ LDU, H( INCOL+K1, JCOL ), LDH, ZERO, WH, 00725 $ LDWH ) 00726 CALL SLACPY( 'ALL', NU, JLEN, WH, LDWH, 00727 $ H( INCOL+K1, JCOL ), LDH ) 00728 160 CONTINUE 00729 * 00730 * ==== Vertical multiply ==== 00731 * 00732 DO 170 JROW = JTOP, MAX( KTOP, INCOL ) - 1, NV 00733 JLEN = MIN( NV, MAX( KTOP, INCOL )-JROW ) 00734 CALL SGEMM( 'N', 'N', JLEN, NU, NU, ONE, 00735 $ H( JROW, INCOL+K1 ), LDH, U( K1, K1 ), 00736 $ LDU, ZERO, WV, LDWV ) 00737 CALL SLACPY( 'ALL', JLEN, NU, WV, LDWV, 00738 $ H( JROW, INCOL+K1 ), LDH ) 00739 170 CONTINUE 00740 * 00741 * ==== Z multiply (also vertical) ==== 00742 * 00743 IF( WANTZ ) THEN 00744 DO 180 JROW = ILOZ, IHIZ, NV 00745 JLEN = MIN( NV, IHIZ-JROW+1 ) 00746 CALL SGEMM( 'N', 'N', JLEN, NU, NU, ONE, 00747 $ Z( JROW, INCOL+K1 ), LDZ, U( K1, K1 ), 00748 $ LDU, ZERO, WV, LDWV ) 00749 CALL SLACPY( 'ALL', JLEN, NU, WV, LDWV, 00750 $ Z( JROW, INCOL+K1 ), LDZ ) 00751 180 CONTINUE 00752 END IF 00753 ELSE 00754 * 00755 * ==== Updates exploiting U's 2-by-2 block structure. 00756 * . (I2, I4, J2, J4 are the last rows and columns 00757 * . of the blocks.) ==== 00758 * 00759 I2 = ( KDU+1 ) / 2 00760 I4 = KDU 00761 J2 = I4 - I2 00762 J4 = KDU 00763 * 00764 * ==== KZS and KNZ deal with the band of zeros 00765 * . along the diagonal of one of the triangular 00766 * . blocks. ==== 00767 * 00768 KZS = ( J4-J2 ) - ( NS+1 ) 00769 KNZ = NS + 1 00770 * 00771 * ==== Horizontal multiply ==== 00772 * 00773 DO 190 JCOL = MIN( NDCOL, KBOT ) + 1, JBOT, NH 00774 JLEN = MIN( NH, JBOT-JCOL+1 ) 00775 * 00776 * ==== Copy bottom of H to top+KZS of scratch ==== 00777 * (The first KZS rows get multiplied by zero.) ==== 00778 * 00779 CALL SLACPY( 'ALL', KNZ, JLEN, H( INCOL+1+J2, JCOL ), 00780 $ LDH, WH( KZS+1, 1 ), LDWH ) 00781 * 00782 * ==== Multiply by U21**T ==== 00783 * 00784 CALL SLASET( 'ALL', KZS, JLEN, ZERO, ZERO, WH, LDWH ) 00785 CALL STRMM( 'L', 'U', 'C', 'N', KNZ, JLEN, ONE, 00786 $ U( J2+1, 1+KZS ), LDU, WH( KZS+1, 1 ), 00787 $ LDWH ) 00788 * 00789 * ==== Multiply top of H by U11**T ==== 00790 * 00791 CALL SGEMM( 'C', 'N', I2, JLEN, J2, ONE, U, LDU, 00792 $ H( INCOL+1, JCOL ), LDH, ONE, WH, LDWH ) 00793 * 00794 * ==== Copy top of H to bottom of WH ==== 00795 * 00796 CALL SLACPY( 'ALL', J2, JLEN, H( INCOL+1, JCOL ), LDH, 00797 $ WH( I2+1, 1 ), LDWH ) 00798 * 00799 * ==== Multiply by U21**T ==== 00800 * 00801 CALL STRMM( 'L', 'L', 'C', 'N', J2, JLEN, ONE, 00802 $ U( 1, I2+1 ), LDU, WH( I2+1, 1 ), LDWH ) 00803 * 00804 * ==== Multiply by U22 ==== 00805 * 00806 CALL SGEMM( 'C', 'N', I4-I2, JLEN, J4-J2, ONE, 00807 $ U( J2+1, I2+1 ), LDU, 00808 $ H( INCOL+1+J2, JCOL ), LDH, ONE, 00809 $ WH( I2+1, 1 ), LDWH ) 00810 * 00811 * ==== Copy it back ==== 00812 * 00813 CALL SLACPY( 'ALL', KDU, JLEN, WH, LDWH, 00814 $ H( INCOL+1, JCOL ), LDH ) 00815 190 CONTINUE 00816 * 00817 * ==== Vertical multiply ==== 00818 * 00819 DO 200 JROW = JTOP, MAX( INCOL, KTOP ) - 1, NV 00820 JLEN = MIN( NV, MAX( INCOL, KTOP )-JROW ) 00821 * 00822 * ==== Copy right of H to scratch (the first KZS 00823 * . columns get multiplied by zero) ==== 00824 * 00825 CALL SLACPY( 'ALL', JLEN, KNZ, H( JROW, INCOL+1+J2 ), 00826 $ LDH, WV( 1, 1+KZS ), LDWV ) 00827 * 00828 * ==== Multiply by U21 ==== 00829 * 00830 CALL SLASET( 'ALL', JLEN, KZS, ZERO, ZERO, WV, LDWV ) 00831 CALL STRMM( 'R', 'U', 'N', 'N', JLEN, KNZ, ONE, 00832 $ U( J2+1, 1+KZS ), LDU, WV( 1, 1+KZS ), 00833 $ LDWV ) 00834 * 00835 * ==== Multiply by U11 ==== 00836 * 00837 CALL SGEMM( 'N', 'N', JLEN, I2, J2, ONE, 00838 $ H( JROW, INCOL+1 ), LDH, U, LDU, ONE, WV, 00839 $ LDWV ) 00840 * 00841 * ==== Copy left of H to right of scratch ==== 00842 * 00843 CALL SLACPY( 'ALL', JLEN, J2, H( JROW, INCOL+1 ), LDH, 00844 $ WV( 1, 1+I2 ), LDWV ) 00845 * 00846 * ==== Multiply by U21 ==== 00847 * 00848 CALL STRMM( 'R', 'L', 'N', 'N', JLEN, I4-I2, ONE, 00849 $ U( 1, I2+1 ), LDU, WV( 1, 1+I2 ), LDWV ) 00850 * 00851 * ==== Multiply by U22 ==== 00852 * 00853 CALL SGEMM( 'N', 'N', JLEN, I4-I2, J4-J2, ONE, 00854 $ H( JROW, INCOL+1+J2 ), LDH, 00855 $ U( J2+1, I2+1 ), LDU, ONE, WV( 1, 1+I2 ), 00856 $ LDWV ) 00857 * 00858 * ==== Copy it back ==== 00859 * 00860 CALL SLACPY( 'ALL', JLEN, KDU, WV, LDWV, 00861 $ H( JROW, INCOL+1 ), LDH ) 00862 200 CONTINUE 00863 * 00864 * ==== Multiply Z (also vertical) ==== 00865 * 00866 IF( WANTZ ) THEN 00867 DO 210 JROW = ILOZ, IHIZ, NV 00868 JLEN = MIN( NV, IHIZ-JROW+1 ) 00869 * 00870 * ==== Copy right of Z to left of scratch (first 00871 * . KZS columns get multiplied by zero) ==== 00872 * 00873 CALL SLACPY( 'ALL', JLEN, KNZ, 00874 $ Z( JROW, INCOL+1+J2 ), LDZ, 00875 $ WV( 1, 1+KZS ), LDWV ) 00876 * 00877 * ==== Multiply by U12 ==== 00878 * 00879 CALL SLASET( 'ALL', JLEN, KZS, ZERO, ZERO, WV, 00880 $ LDWV ) 00881 CALL STRMM( 'R', 'U', 'N', 'N', JLEN, KNZ, ONE, 00882 $ U( J2+1, 1+KZS ), LDU, WV( 1, 1+KZS ), 00883 $ LDWV ) 00884 * 00885 * ==== Multiply by U11 ==== 00886 * 00887 CALL SGEMM( 'N', 'N', JLEN, I2, J2, ONE, 00888 $ Z( JROW, INCOL+1 ), LDZ, U, LDU, ONE, 00889 $ WV, LDWV ) 00890 * 00891 * ==== Copy left of Z to right of scratch ==== 00892 * 00893 CALL SLACPY( 'ALL', JLEN, J2, Z( JROW, INCOL+1 ), 00894 $ LDZ, WV( 1, 1+I2 ), LDWV ) 00895 * 00896 * ==== Multiply by U21 ==== 00897 * 00898 CALL STRMM( 'R', 'L', 'N', 'N', JLEN, I4-I2, ONE, 00899 $ U( 1, I2+1 ), LDU, WV( 1, 1+I2 ), 00900 $ LDWV ) 00901 * 00902 * ==== Multiply by U22 ==== 00903 * 00904 CALL SGEMM( 'N', 'N', JLEN, I4-I2, J4-J2, ONE, 00905 $ Z( JROW, INCOL+1+J2 ), LDZ, 00906 $ U( J2+1, I2+1 ), LDU, ONE, 00907 $ WV( 1, 1+I2 ), LDWV ) 00908 * 00909 * ==== Copy the result back to Z ==== 00910 * 00911 CALL SLACPY( 'ALL', JLEN, KDU, WV, LDWV, 00912 $ Z( JROW, INCOL+1 ), LDZ ) 00913 210 CONTINUE 00914 END IF 00915 END IF 00916 END IF 00917 220 CONTINUE 00918 * 00919 * ==== End of SLAQR5 ==== 00920 * 00921 END