Chapter 7. Communicator and Topologies

Chapter 7. Communicator and Topologies

º´·Ä ÇÁ·Î±×·¥¿¡¼­ ÇÁ·Î¼¼¼­°£ÀÇ Åë½ÅÀ» À§ÇØ MPI´Â communicator¿Í topology¸¦ Áö¿øÇÑ´Ù. communicator¶ó´Â °ÍÀº ÇϳªÀÇ ÇÁ·Î¼¼¼­°¡ message¸¦ sendÇÏ¸é ±×°ÍÀ» receiveÇÒ ¼ö ÀÖ´Â ÇÁ·Î¼¼¼­ÀÇ ÁýÇÕÀÌ´Ù. Topology´Â communicator¿¡ ¼ÓÇÏ´Â ÇÁ·Î¼¼¼­µéÀÇ °ø°£»ó ¹è¿­ ÇüŸ¦ Á¤ÀÇÇÏ´Â °ÍÀÌ´Ù. ÀÌ ÀåÀº matrix multiplication ±¸ÇöÀ» ÅëÇØ MPI programming¿¡¼­ communicator¿Í topology¸¦ ¾î¶»°Ô ÀÌ¿ëÇÒ ¼ö ÀÖ´ÂÁö º¸¿© ÁØ´Ù.

7.1 Matrix Multiplication

ÀϹÝÀûÀÎ matrix multiplication¿¡¼­ cij´Â
= ai0b0j + ai1b1j + ¡¦ + ai,n-1bn-1,j -----(1)
·Î Ç¥ÇöµÇ¸ç serial programÀº

int i, j, k;
for(i = 0; i < n; i++) 
  for(j = 0; j < n; j++) 
    for(k = 0; k < n; k++) 
      c[i][j] += a[i][k] * b[k][j];
ÀÌ´Ù. À̰ÍÀ» º´·ÄÈ­ ½ÃŰ´Â ¾Ë°í¸®ÁòÀº ¾ÆÁÖ ´Ù¾çÇÏ¸ç ±× ÁßÀÇ ÇϳªÀÎ Fox's AlgorithmÀ» ÀÌ¿ëÇÏ¿© º´·Ä ÇÁ·Î±×·¥À» ±¸ÇöÇÑ´Ù.

7.2 Fox's Algorithm

n by n matrix A, B¸¦ °öÇØ n by n matrix C¸¦ °è»êÇÏ´Â µ¥ n2°³ÀÇ ÇÁ·Î¼¼¼­¸¦ »ç¿ëÇÒ ¼ö ÀÖ°í ÇÁ·Î¼¼¼­ p(i,j)°¡ cijÀÇ °è»êÀ» ´ã´çÇϰí aik, bkjÀÇ °ªÀ» °¡Áö°í ÀÖ´Ù°í Çϸé (1)ÀÇ ½Ä¿¡ ÀÇÇØ n ´Ü°è µÚ¿¡ C°¡ °è»êµÈ´Ù.
Fox's algorithmÀº ¿©±â¼­ °¢ ÇÁ·Î¼¼¼­ p(i, j)°¡ k-1, k, k+1 ´Ü°è¿¡¼­ aik, bkj¸¦ ¾î¶² patternÀ¸·Î ¿ä±¸ÇÏ´Â Áö¿¡ ÁÖ¸ñÇÑ´Ù.¿¹¸¦ µé¾î 3 by 3 matrix A, B¸¦ °öÇØ 3 by 3 matrix C¸¦ °è»êÇÏ´Â °æ¿ì¸¦ »ý°¢ÇØ º¸ÀÚ. 0 ´Ü°è¿¡¼­ ÁÖ¸ñÇÒ Á¡Àº p(0,*)´Â a00¸¦ µ¿½Ã¿¡ »ç¿ëÇϰí, p(1,*)´Â a11, p(2,*)´Â a22¸¦ µ¿½Ã¿¡ »ç¿ëÇÑ´Ù.(*´Â 0, 1, 2) Áï, a00´Â p(0,0)ÀÌ ¼ÒÀ¯Çϰí ÀÖÀ¸¹Ç·Î ´Ù¸¥ p(0, 1), p(0,2)¿¡°Ô broadcast ÇØÁÖ¾î¾ß Çϰí a11Àº p(1,1)ÀÌ p(1,0)¿Í p(1,2)¿¡°Ô broadcastÇØÁÖ¾î¾ß ÇÑ´Ù. ¸¶Âù°¡Áö·Î p(2,2)µµ p(2,0), p(2,1)¿¡°Ô broadcastÇØÁØ´Ù. 1´Ü°è¿¡¼­´Â p(0,1)ÀÌ ÀÚ½ÅÀÌ ¼ÓÇÑ 0Çà¿¡ broadcastÇØ ÁÖ¸ç, p(1,2)´Â 1Çà¿¡ p(2,0)Àº 2Çà¿¡ broadcast ÇØÁØ´Ù. 2´Ü°è¿¡¼­µµ °¢ ÇÁ·Î¼¼¼­´Â ÀÚ½ÅÀÌ ¼ÓÇÑ Çà¿¡ ÀÚ½ÅÀÌ °¡Áø data¸¦ broadcast ÇØÁØ´Ù.

°¢ ÇÁ·Î¼¼¼­°¡ B Çà·ÄÀÇ dataµµ ÀÏÁ¤ÇÑ patternÀ¸·Î »ç¿ëÇÑ´Ù. ±×¸² iiiÀÇ ºÎºÐ¿¡¼­ °¢ ÇÁ·Î¼¼¼­´Â ¿­·Î ±¸ºÐµÈ µÚ, °¢ ´Ü°è¿¡¼­ cycleÀÇ ÇüÅ·ΠÀ§·Î data¸¦ Àü´ÞÇÏ¿© »ç¿ëÇÏ´Â °ÍÀ» ¾Ë ¼ö ÀÖ´Ù.

ÀÌ·± ÀÏÁ¤ÇÑ data»ç¿ë ÆÐÅÏÀ» ÅëÇØ °¢ ÇÁ·Î¼¼¼­°¡ ¾î¶² groupµé¿¡ ¼ÓÇØ¾ß ÇÏ°í ¾î¶² ÇüÅ·ΠÅë½ÅÀ» ÇÏ´Â Áö ¾Ë ¼ö ÀÖ´Ù. Fox¡¯s algorithmÀº Àüü ÇÁ·Î¼¼¼­¸¦ 2D-gridÀÇ ÇüÅ·Π¹èÄ¡ÇÏ°í °¢ row ¿Í columnº°·Î groupingÇÏ¿© °¢ rowº° group´Â broadcastingÀ» ÅëÇØ data¸¦ Àü´ÞÇϸç, columnº° groupÀº cycleÀÇ ÇüÅ·Πdata¸¦ Àü´ÞÇÑ´Ù.

ÇÁ·Î¼¼¼­ÀÇ °³¼ö p °¡ n2º¸´Ù ÀÛÀ» °æ¿ì °¢ ÇÁ·Î¼¼¼­´Â aij, bij´ë½Å n/ X n/ sub matrix¸¦ °¡Áö°Ô µÈ´Ù.

/* my process row = i, my process column = j */ 
q = sqrt(p);
dest = ((i-1) mod q, j);
source = ((i+1) mod q, j);
for (stage = 0; stage < q; stage++) {
  k_bar = (i + stage) mod q;
  Broadcast A[i, k_bar] across row i;
  C[i, j] = C[i, j] + A[i, k_bar]*B[k_bar, j];
  Send B[k_bar, j] to dest; 
  Receive B[(k_bar+1) mod q, j] from source;
}

7.3 Communicator

Communicator¶õ message¸¦ ÁÖ°í ¹ÞÀ» ¼ö ÀÖ´Â ÇÁ·Î¼¼¼­ÀÇ groupÀÌ´Ù. ¾ÕÀÇ matrix multiplicationÀ» ¿¹·Î µé¸é, °¢ rowµéÀÌ ÇϳªÀÇ communicator¸¦ Çü¼ºÇÏ°í °¢ columnÀÌ ¶Ç ÇϳªÀÇ communicator¸¦ Çü¼ºÇÑ´Ù. ÇϳªÀÇ ÇÁ·Î¼¼¼­´Â ¿©·¯ communicator¿¡ ¼ÓÇÒ ¼ö ÀÖ´Ù. ÀÌ communicator´Â intra-communicator¿Í inter-communicatorÀÇ µÎ °¡Áö Á¾·ù·Î ³ª´µ´Â µ¥, Intra´Â communicator¾È¿¡¼­ ±× ¼Ó¿¡ ¼ÓÇÏ´Â ÇÁ·Î¼¼¼­µéÀÌ message¸¦ »óÈ£ ±³È¯ÇÏ´Â °ÍÀ» ¸»Çϸç, Inter´Â ±×·¯ÇÑ Communicatorµé°£ÀÇ message »óÈ£ ±³È¯À» ¸»ÇÑ´Ù.

Communicator´Â group°ú context·Î °áÁ¤µÇ´Â µ¥, groupÀº ÇÁ·Î¼¼¼­µéÀÇ ÁýÇÕÀ¸·Î ¿©±â¿¡ ¼ÓÇÑ ÇÁ·Î¼¼¼­µéÀÇ ¸î °³°¡ ¼±ÅÃµÇ¾î µ¿ÀÏÇÑ context¸¦ °¡Áö°Ô µÇ¸é ÇϳªÀÇ communicator¸¦ Çü¼ºÇÏ°Ô µÈ´Ù. Áï, µ¿ÀÏÇÑ context¸¦ °¡Áø ÇÁ·Î¼¼¼­¸¸ÀÌ »óÈ£ message±³È¯ÀÌ °¡´ÉÇÏ´Ù.

7.4 Working with Groups, Contexts, and Communicators

Communicator¸¦ »ý¼ºÇϱâ À§Çؼ­´Â ¸ÕÀú Group°ú Context¸¦ »ý¼ºÇØ¾ß ÇÑ´Ù.

MPI_Group goup_world;
MPI_Group first_row_group;
MPI_Comm first_row_comm;
int * process_ranks;

/* »õ·Î¿î communicator¿¡ Âü¿©ÇÒ ÇÁ·Î¼¼¼­ÀÇ list±¸¼º */
process_ranks = (int *) malloc (q * sizeof(int));
for (proc = 0; proc < q; proc++)
process_ranks[proc] = proc; ------------------(1)

/* MPI_COMM_WORLD ¸¦ ±¸¼ºÇÏ´Â groupÀ» ±¸ÇÔ */
MPI_Comm_group(MPI_COMM_WORLD, &group_world); ------------------(2)

/* »õ·Î¿î groupÀ» »ý¼º */
MPI_Group_incl(group_world, q, process_ranks, &first_row_group); -----(3)

/* »õ·Î¿î communicator¸¦ »ý¼º */
MPI_Comm_create(MPI_COMM_WORLD, first_row_group, &first_row_comm); ---(4)
À§ÀÇ code´Â ¾î¶»°Ô »õ·Î¿î communicator°¡ »ý¼ºµÇ´Â Áö º¸¿©ÁØ´Ù. MPIº´·Ä ÇÁ·Î±×·¥Àº ±âº»ÀûÀ¸·Î MPI_COMM_WORLD¶ó´Â ±âº»ÀûÀÎ communicator¸¦ °¡Áö°í ÀÌ communicator¸¦ ´Ù¸¥ ÀÛÀº communicatorµé·Î ºÐ¸®ÇÑ´Ù. ¸ÕÀú MPI_COMM_WORLD¿¡ ¼ÓÇÏ´Â ÇÁ·Î¼¼¼­ Áß¿¡¼­ ¾î¶² ÇÁ·Î¼¼¼­¸¦ »õ·Î¿î sub groupÀ¸·Î ºÐ¸® ½Ãų Áö (1)À» ÅëÇØ °áÁ¤ÇÑ´Ù. (2)¸¦ ÅëÇØ ±× ÇÁ·Î¼¼¼­µéÀÌ ¼ÓÇØ ÀÖ´Â ±âÁ¸ groupÀ» ¾Ë¾Æ³½´Ù. (3)À» ÅëÇØ (1)¿¡¼­ ¼±ÅÃÇÑ ÇÁ·Î¼¼¼­·Î ±¸¼ºµÇ´Â »õ·Î¿îsub groupÀ» Çü¼ºÇÑ´Ù. (4)¸¦ ÅëÇØ ÀÌ ÇÁ·Î¼¼¼­µéÀÌ »õ·Î¿î communicator¶ó´Â °ÍÀ» ¼±¾ðÇϴµ¥ À̶§ ÀÌ ÇÁ·Î¼¼¼­µé¿¡°Ô µ¿ÀÏÇÑ context°¡ ºÎ¿©µÈ´Ù. ÀÌ·± ¼ø¼­·Î first_row_commÀ̶ó´Â »õ·Î¿î communicator°¡ »ý¼ºµÇ¾ú´Ù. ÀÌ communicator³»¿¡¼­´Â ¼­·Î°£¿¡ data¸¦ broadcastÇÒ ¼ö ÀÖ´Ù. ´ÙÀ½Àº ÇÁ·Î¼¼¼­ 0ÀÌ ÀÚ½ÅÀÌ ¼ÓÇÑ communicator¿¡ data¸¦ broadcastÇÏ´Â ¿¹ÀÌ´Ù.
int my_rank_in_first_row;
float *A_00;

/* my_rank´Â group_world¿¡¼­ ÇÁ·Î¼¼¼­ rank¸¦ ÀÇ¹Ì */
if( my_rank < q) {
  MPI_Comm_rank( first_row_comm, &my_rank_in_first_row);

/* A_00¸¦ À§ÇÑ memory ÇÒ´ç */
  A_00 = (float *)malloc(n_bar * n_bar * sizeof(float));

  if(my_rank_in_first_row == 0) {
  /* Initialize A_00 */
¡¦¡¦
    MPI_Bcast(A_00, n_bar*n_bar, MPI_FLOAT, 0, first_row_comm);
}
°¢°¢ÀÇ syntax´Â ´ÙÀ½°ú °°´Ù.
int MPI_Comm_group(
        MPI_Comm    comm     /* ÀÔ·Â */,
        MPI_Group *   group  /* Ãâ·Â */  )
int MPI_Group_incl(
        MPI_Group    old_group              /* ÀÔ·Â */,
        int           new_group_size        /* ÀÔ·Â */,
        int           ranks_in_old_group[]  /* ÀÔ·Â */, 
        MPI_Group*   new_group              /* Ãâ·Â */)
int MPI_Comm_Create(
        MPI_Comm   old_comm   /* ÀÔ·Â */,
        MPI_Group   new_group /* ÀÔ·Â */,
        MPI_Comm*  new_comm   /* Ãâ·Â */)
7.5 MPI_Comm_split

¾Õ ÀýÀÇ ¹æ¹ýº¸´Ù Á»´õ °£´ÜÇÏ°Ô ÇϳªÀÇ communicator¸¦ ¸î °³ÀÇ communicator·Î ºÐÇÒ ÇÏ´Â ¹æ¹ýÀº MPI_Comm_split ÇÔ¼ö¸¦ ÀÌ¿ëÇÏ´Â °ÍÀÌ´Ù. À̰ÍÀº key¸¦ ±âÁØÀ¸·Î °°Àº key¸¦ °¡Áø ÇÁ·Î¼¼¼­µéÀ» ÇϳªÀÇ communicator·Î ºÐ¸®½ÃÄÑ ÁØ´Ù. ´ÙÀ½ÀÇ ¿¹´Â °°Àº ÇàÀ» µ¿ÀÏÇÑ key·Î ÇÏ¿© µ¿ÀÏ Çà¿¡ ÀÖ´Â ÇÁ·Î¼¼¼­µéÀ» °°Àº communicator·Î ºÐ¸®ÇÏ´Â °ÍÀÌ´Ù.

MPI_Comm my_row_comm;
int        my_row;
/* my_rank´Â MPI_COMM_WORLDÀÇ rank¸¦ ÀÇ¹Ì */
/* q´Â sub matrixÀÇ ÇàÀ» ÀÇ¹Ì */
my_row = my_rank / q;
MPI_Comm_split(MPI_COMM_WORLD, my_row, my_rank, &my_row_comm);
À§ÀÇ ÇÔ¼ö¸¦ ÅëÇØ MPI_COMM_WORLD´Â q°³ÀÇ »õ·Î¿î communicatorµé·Î ºÐ¸®µÈ´Ù. ¿©±â¼­ key´Â my_row°¡ µÇ°í ÀÚ½ÅÀÌ ¾î¶² communicator¿¡ ¼ÓÇÏ´ÂÁö´Â my_row_commÀ» ÅëÇØ ¾Ë ¼ö ÀÖ´Ù.

MPI_Comm_splitÀÇ syntax´Â ´ÙÀ½°ú °°´Ù.

int MPI_Comm_split(
        MPI_Comm    old_comm     /* ÀÔ·Â */, 
        int            split_key /* ÀÔ·Â */, 
        int            rank_key  /* ÀÔ·Â */, 
        MPI_Comm*   new_comm     /* Ãâ·Â */)
7.6 Toplogies Topology´Â ÇÁ·Î¼¼¼­µéÀÇ °ø°£»óÀÇ ¹è¿­À» ¸»ÇÏ´Â °ÍÀ¸·Î MPI¿¡¼­´Â Cartesian(grid)°ú graph°¡ °¡´ÉÇÏ´Ù. MPI´Â topology´Â °¡»óÀÇ ¹è¿­À» ¸»ÇÏ´Â °ÍÀ¸·Î ½ÇÁ¦ ÇÁ·Î¼¼¼­ÀÇ ¹è¿­°ú ´Ù¸¦ ¼ö ÀÖ´Ù. Fox¡¯s algorithm ±¸ÇöÀ» À§ÇØ topology¸¦ grid·Î ÁöÁ¤ÇÒ °æ¿ì °í·ÁÇØ¾ß ÇÒ ¿ä¼ÒµéÀº

1. gridÀÇ Â÷¿ø --> 2Â÷·Î ÁöÁ¤Çϰí 2. °¢ Â÷¿øÀÇ Å©±â¸¦ ÁöÁ¤ --> qÇà°ú q¿­À̶ó°í ÁöÁ¤ 3. Â÷¿øÀÇ ³¡ÀÌ ¼­·Î ¿¬°á(wrap around) µÇ¾ú´ÂÁö ¾Æ´ÑÁö¸¦ ÁöÁ¤ 4. ½ÇÁ¦ ¹è¿­°ú °¡»ó ¹è¿­À» µ¿ÀÏÇÏ°Ô ÇÒ °ÍÀÎÁö ¾Æ´ÑÁö¸¦ ÁöÁ¤ÇÑ´Ù.

À§ÀÇ 4°¡Áö Á¶°ÇµéÀ» ´ÙÀ½ÀÇ ÄÚµå·Î ½áÁÖ¸é grid topology°¡ Çü¼ºµÈ´Ù.

MPI_Comm grid_comm;
int        dim_sizes[2];
int        wrap_around[2];
int        reorder = 1;
dim_sizes[0] = dim_sizes[1] = q;
wrap_around[0] = wrap_around[1] = 1;
MPI_Cart_create(MPI_COMM_WORLD, 2, dim_sizes, wrap_around, reorder, &grid_comm);

int MPI_Cart_create(
        MPI_Comm    old_comm          /* in */,
        Int            number_of_dims /* in */,
        Int            dim_sizes[]    /* in */, 
        Int            wrap_around[]  /* in */,
        Int            reorder        /* in */,
        MPI_Comm*   cart_comm         /* out */)
¹è¿­»óÀÇ ÇÁ·Î¼¼¼­ À§Ä¡¿Í rank´Â ´ÙÀ½ÀÇ ÇÔ¼öµéÀ» ÅëÇØ¼­ ¾Ë ¼ö ÀÖ´Ù.
int MPI_Cart_rank (
        MPI_Comm    comm             /* in */,
        int            coordinates[] /* in */,
        int *          rank          /* out */)

int MPI_Cart_coords(
        MPI_Comm    comm               /* in */,
        int            rank            /* in */,
        int            number_of_dims  /* in */,
        int            coordinates[]   /* out */)
7.7 MPI_Cart_sub

MPI_Cart_subÇÔ¼ö¸¦ ÀÌ¿ëÇÏ¿© grid¸¦, ¿¹¸¦ µé¾î 3Â÷¿øÀÇ grid¸¦ 2Â÷¿ø grid·Î 2Â÷¿ø grid¸¦ 1Â÷¿ø grid µî, ´õ ³·Àº Â÷¿øÀ¸·Î ºÐ¸®ÇÒ ¼ö ÀÖ´Ù. ¿¹¸¦ µé¾î 2D-grid¸¦ rowº° 1-D grid·Î ºÐ¸®ÇÏ´Â °æ¿ì,

int free_coords[2];
MPI_Comm row_comm;

free_coords[0] = 0;  /* boolean °ª */
free_coords[1] = 1;
MPI_Cart_sub(grid_comm, free_coords, &row_comm);
À» ¼öÇàÇϸé free_coords[0]¸¦ ÅëÇØ row°¡ ¼­·Î ´Ù¸¥ communicator¿¡ ¼ÓÇϰí, free_coords[1]¸¦ ÅëÇØ columnÀº °°Àºcommunicator¿¡ ¼ÓÇÑ´Ù´Â °ÍÀ» ¾Ë·Á ÁØ´Ù. À§ÀÇ ¿¹´Â 2D grid¸¦ 1D-rowº°·Î ºÐ¸®ÇÏ´Â °ÍÀÌ´Ù.
int MPI_Cart_sub(
        MPI_Comm    cart_comm      /* in */,
        int         free_coords[]  /* in */, 
        MPI_Comm*   new_comm       /* out */)
7.8 Fox¡¯s algorithm ±¸Çö

¾Õ Àý¿¡¼­ ¼Ò°³ÇÑ ¿©·¯ ÇÔ¼öµéÀ» Åä´ë·Î 7.2¿¡¼­ ¼Ò°³ÇÑ fox¡¯s ¾Ë°í¸®ÁòÀ» ±¸ÇöÇß´Ù. ¸ÕÀú topology¸¦ grid·Î ¼³Á¤ÇÏ°í °¢ rowº°, °¢ columnº°·Î communicator¸¦ ÁöÁ¤ÇÑ´Ù. ±×¸®°í ¾Ë°í¸®Áò¿¡¼­ ±â¼úÇѵ¥·Î °¢ ÇÁ·Î¼¼¼­µé°£ µ¿ÀÛÀ» ¼öÇàÇÑ´Ù.

typedef struct {
  int p;             /* Total number of processes */
  MPI_Comm comm;     /*communicator for entire grid */
  MPI_Comm row_comm; /* communicator for my row */
  MPI_Comm col_comm; /* communicator for my col */
  int q;             /* order of grid */
  int my_row;        /* my column number */
  int my_col;        /* my column number */
  int my_rank;       /* my rank in the grid comm */
} GRID_INFO_T;

void Setup_grid(GRID_INFO_T* grid /* out */) {
  int old_rank;
  int dimensions[2];
  int wrap_arround[2];
  int coordinates[2];
  int free_coords[2];

  /* set up global grid information */
  MPI_Comm_size(MPI_COMM_WORLD, &(grid->p));
  MPI_Comm_rank(MPI_COMM_WORLD, &old_rank);

  /* We assume p is a perfect square */
  grid->q = (int) sqrt((double) grid->p);
  dimensions[0] = dimensions[1] = grid->q;

  /* We want a circular shift in second dimension */
  wrap_around[0] = wrap_around[1] = 1;
  MPI_Cart_create(MPI_COMM_WORLD, 2, dimensions, wrap_around, 1, &(grid->comm));
  MPI_Comm_rank(grid->comm, &(grid->my_rank));
  MPI_Cart_coords(grid->comm, grid->my_rank, 2, coordinates);
  grid->my_row = coordinates[0];
  grid->my_col = coordinates[1];

  /* Set up row communicators */
  free_coords[0] = 0;
  free_coords[1] = 1;
  MPI_Cart_sub(grid->comm, free_coords, &(grid->row_comm));

  /* Setup column communicators */
  free_coords[0] = 1;
  free_coords[1] = 0;
  MPI_Cart_sub(grid->comm, free_coords, &(grid->col_comm));
} /* Setup_grid */
void Fox(int             n        /* in */,
         GRID_INFO_T*    grid     /* in */,
         LOCAL_MATRIX_T* local_A  /* in */,
         LOCAL_MATRIX_T* local_B  /* in */,
         LOCAL_MATRIX_T* local_C  /* out */ ) {
  LOCAL_MATRIX_T * tmp_A;    /* Storage for the sub matrix of A used during the current stage */
  int    stage;
  int    bcast_root;
  int    n_bar;
  int    source;
  int    dest;
  MPI_Status  status;

  n_bar = n/grid->q;
  Set_to_zero(local_C);

  /* Calculate addresses for circular shift of B */
  source = (grid->my_row + 1) % grid->q;
  dest = (grid->my_row + grid->q - 1) % grid->q;

  /* Set aside storage for the broadcast block of A */
  tmp_A = Local_matrix_allocate(n_bar);

  for(stage = 0; stage < grid->q; stage++) {
    bcast_root = (grid->my_row + stage) % grid->q;
    if(bcast_root == grid->my_col) {
      MPI_Bcast(local_A, 1, local_matrix_mpi_t, bcas_root, grid->row_comm);
      Local_matrix_multiply(local_A, local_B, local_C);
    } else {
      MPI_Bcast(temp_A, 1, local_matrix_mpi_t, bcast_root, grid->row_comm);
      Local_matrix_multiply(temp_A, local_B, local_C);
    }
    MPI_Sendrecv_replace(local_B, 1, local_matrix_mpi_t, dest, 0, source, 0, grid->col_comm, &status);
  } /* for */
} /* Fox */
MPI_Sendrecv_replace¶õ »õ·Î¿î ÇÔ¼ö°¡ »ç¿ëµÇ¾ú´Âµ¥, ÀÌ ÇÔ¼ö¸¦ »ç¿ëÇÏ´Â ÇÁ·Î¼¼¼­ÀÇ rank°¡ source¿Í °°À¸¸é, dest·Î data¸¦ sendÇϰí dest¿Í °°À¸¸é source·ÎºÎÅÍ data¸¦ receiveÇÑ´Ù.
Int MPI_Sendrecv_replace(
        Void*         buffer    /* in./out */,
        Int           count     /* in */,
        MPI_Datatype  datatype  /* in */,
        Int           dest      /* in */,
        Int           send_tag  /* in */,
        Int           source    /* in */,
        Int           recv_tag  /* in */,
        MPI_Comm   comm         /* in */,
        MPI_Status*  status     /* out */)
7.9 ¿ä¾à

ÀÌ Àå¿¡¼­´Â MPIº´·Ä ÇÁ·Î±×·¥¿¡¼­ ÇÁ·Î¼¼¼­µéÀÇ groupÈ­¸¦ À§ÇÑ communicator¿Í topology¸¦ »ç¿ëÇÏ´Â ¹æ¹ý°ú ±×°ÍµéÀ» Áö¿øÇÏ´Â ÇÔ¼öµéÀ» »ìÆì º¸¾Ò°í, ±× ÇÔ¼öµéÀ» »ç¿ëÇÏ¿© fox¡¯s algorithmÀ» ±¸ÇöÇØ º¸¾Ò´Ù.