|
Lines 4038-4044
Link Here
|
| 4038 |
<sect2> |
4038 |
<sect2> |
| 4039 |
<title>Synopsis</title> |
4039 |
<title>Synopsis</title> |
| 4040 |
|
4040 |
|
| 4041 |
<para>High-availability is one of the main requirements in serious |
4041 |
<para>High availability is one of the main requirements in serious |
| 4042 |
business applications and highly-available storage is a key |
4042 |
business applications and highly-available storage is a key |
| 4043 |
component in such environments. Highly Available STorage, or |
4043 |
component in such environments. Highly Available STorage, or |
| 4044 |
<acronym>HAST<remark role="acronym">Highly Available |
4044 |
<acronym>HAST<remark role="acronym">Highly Available |
|
Lines 4109-4115
Link Here
|
| 4109 |
drives.</para> |
4109 |
drives.</para> |
| 4110 |
</listitem> |
4110 |
</listitem> |
| 4111 |
<listitem> |
4111 |
<listitem> |
| 4112 |
<para>File system agnostic, thus allowing to use any file |
4112 |
<para>File system agnostic; works with any file |
| 4113 |
system supported by &os;.</para> |
4113 |
system supported by &os;.</para> |
| 4114 |
</listitem> |
4114 |
</listitem> |
| 4115 |
<listitem> |
4115 |
<listitem> |
|
Lines 4152-4158
Link Here
|
| 4152 |
total.</para> |
4152 |
total.</para> |
| 4153 |
</note> |
4153 |
</note> |
| 4154 |
|
4154 |
|
| 4155 |
<para>Since the <acronym>HAST</acronym> works in |
4155 |
<para>Since <acronym>HAST</acronym> works in a |
| 4156 |
primary-secondary configuration, it allows only one of the |
4156 |
primary-secondary configuration, it allows only one of the |
| 4157 |
cluster nodes to be active at any given time. The |
4157 |
cluster nodes to be active at any given time. The |
| 4158 |
<literal>primary</literal> node, also called |
4158 |
<literal>primary</literal> node, also called |
|
Lines 4175-4181
Link Here
|
| 4175 |
</itemizedlist> |
4175 |
</itemizedlist> |
| 4176 |
|
4176 |
|
| 4177 |
<para><acronym>HAST</acronym> operates synchronously on a block |
4177 |
<para><acronym>HAST</acronym> operates synchronously on a block |
| 4178 |
level, which makes it transparent for file systems and |
4178 |
level, making it transparent to file systems and |
| 4179 |
applications. <acronym>HAST</acronym> provides regular GEOM |
4179 |
applications. <acronym>HAST</acronym> provides regular GEOM |
| 4180 |
providers in <filename class="directory">/dev/hast/</filename> |
4180 |
providers in <filename class="directory">/dev/hast/</filename> |
| 4181 |
directory for use by other tools or applications, thus there is |
4181 |
directory for use by other tools or applications, thus there is |
|
Lines 4252-4258
Link Here
|
| 4252 |
For stripped-down systems, make sure this module is available. |
4252 |
For stripped-down systems, make sure this module is available. |
| 4253 |
Alternatively, it is possible to build |
4253 |
Alternatively, it is possible to build |
| 4254 |
<literal>GEOM_GATE</literal> support into the kernel |
4254 |
<literal>GEOM_GATE</literal> support into the kernel |
| 4255 |
statically, by adding the following line to the custom kernel |
4255 |
statically, by adding this line to the custom kernel |
| 4256 |
configuration file:</para> |
4256 |
configuration file:</para> |
| 4257 |
|
4257 |
|
| 4258 |
<programlisting>options GEOM_GATE</programlisting> |
4258 |
<programlisting>options GEOM_GATE</programlisting> |
|
Lines 4290-4299
Link Here
|
| 4290 |
class="directory">/dev/hast/</filename>) will be called |
4290 |
class="directory">/dev/hast/</filename>) will be called |
| 4291 |
<filename><replaceable>test</replaceable></filename>.</para> |
4291 |
<filename><replaceable>test</replaceable></filename>.</para> |
| 4292 |
|
4292 |
|
| 4293 |
<para>The configuration of <acronym>HAST</acronym> is being done |
4293 |
<para>Configuration of <acronym>HAST</acronym> is done |
| 4294 |
in the <filename>/etc/hast.conf</filename> file. This file |
4294 |
in the <filename>/etc/hast.conf</filename> file. This file |
| 4295 |
should be the same on both nodes. The simplest configuration |
4295 |
should be the same on both nodes. The simplest configuration |
| 4296 |
possible is following:</para> |
4296 |
possible is:</para> |
| 4297 |
|
4297 |
|
| 4298 |
<programlisting>resource test { |
4298 |
<programlisting>resource test { |
| 4299 |
on hasta { |
4299 |
on hasta { |
|
Lines 4317-4325
Link Here
|
| 4317 |
alternatively in the local <acronym>DNS</acronym>.</para> |
4317 |
alternatively in the local <acronym>DNS</acronym>.</para> |
| 4318 |
</tip> |
4318 |
</tip> |
| 4319 |
|
4319 |
|
| 4320 |
<para>Now that the configuration exists on both nodes, it is |
4320 |
<para>Now that the configuration exists on both nodes, |
| 4321 |
possible to create the <acronym>HAST</acronym> pool. Run the |
4321 |
the <acronym>HAST</acronym> pool can be created. Run these |
| 4322 |
following commands on both nodes to place the initial metadata |
4322 |
commands on both nodes to place the initial metadata |
| 4323 |
onto the local disk, and start the &man.hastd.8; daemon:</para> |
4323 |
onto the local disk, and start the &man.hastd.8; daemon:</para> |
| 4324 |
|
4324 |
|
| 4325 |
<screen>&prompt.root; <userinput>hastctl create test</userinput> |
4325 |
<screen>&prompt.root; <userinput>hastctl create test</userinput> |
|
Lines 4334-4385
Link Here
|
| 4334 |
available.</para> |
4334 |
available.</para> |
| 4335 |
</note> |
4335 |
</note> |
| 4336 |
|
4336 |
|
| 4337 |
<para>HAST is not responsible for selecting node's role |
4337 |
<para>A HAST node's role (<literal>primary</literal> or |
| 4338 |
(<literal>primary</literal> or <literal>secondary</literal>). |
4338 |
<literal>secondary</literal>) is selected by an administrator |
| 4339 |
Node's role has to be configured by an administrator or other |
4339 |
or other |
| 4340 |
software like <application>Heartbeat</application> using the |
4340 |
software like <application>Heartbeat</application> using the |
| 4341 |
&man.hastctl.8; utility. Move to the primary node |
4341 |
&man.hastctl.8; utility. Move to the primary node |
| 4342 |
(<literal><replaceable>hasta</replaceable></literal>) and |
4342 |
(<literal><replaceable>hasta</replaceable></literal>) and |
| 4343 |
issue the following command:</para> |
4343 |
issue this command:</para> |
| 4344 |
|
4344 |
|
| 4345 |
<screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen> |
4345 |
<screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen> |
| 4346 |
|
4346 |
|
| 4347 |
<para>Similarly, run the following command on the secondary node |
4347 |
<para>Similarly, run this command on the secondary node |
| 4348 |
(<literal><replaceable>hastb</replaceable></literal>):</para> |
4348 |
(<literal><replaceable>hastb</replaceable></literal>):</para> |
| 4349 |
|
4349 |
|
| 4350 |
<screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen> |
4350 |
<screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen> |
| 4351 |
|
4351 |
|
| 4352 |
<caution> |
4352 |
<caution> |
| 4353 |
<para>It may happen that both of the nodes are not able to |
4353 |
<para>When the nodes are unable to |
| 4354 |
communicate with each other and both are configured as |
4354 |
communicate with each other, and both are configured as |
| 4355 |
primary nodes; the consequence of this condition is called |
4355 |
primary nodes, the condition is called |
| 4356 |
<literal>split-brain</literal>. In order to troubleshoot |
4356 |
<literal>split-brain</literal>. To troubleshoot |
| 4357 |
this situation, follow the steps described in <xref |
4357 |
this situation, follow the steps described in <xref |
| 4358 |
linkend="disks-hast-sb">.</para> |
4358 |
linkend="disks-hast-sb">.</para> |
| 4359 |
</caution> |
4359 |
</caution> |
| 4360 |
|
4360 |
|
| 4361 |
<para>It is possible to verify the result with the |
4361 |
<para>Verify the result with the |
| 4362 |
&man.hastctl.8; utility on each node:</para> |
4362 |
&man.hastctl.8; utility on each node:</para> |
| 4363 |
|
4363 |
|
| 4364 |
<screen>&prompt.root; <userinput>hastctl status test</userinput></screen> |
4364 |
<screen>&prompt.root; <userinput>hastctl status test</userinput></screen> |
| 4365 |
|
4365 |
|
| 4366 |
<para>The important text is the <literal>status</literal> line |
4366 |
<para>The important text is the <literal>status</literal> line, |
| 4367 |
from its output and it should say <literal>complete</literal> |
4367 |
which should say <literal>complete</literal> |
| 4368 |
on each of the nodes. If it says <literal>degraded</literal>, |
4368 |
on each of the nodes. If it says <literal>degraded</literal>, |
| 4369 |
something went wrong. At this point, the synchronization |
4369 |
something went wrong. At this point, the synchronization |
| 4370 |
between the nodes has already started. The synchronization |
4370 |
between the nodes has already started. The synchronization |
| 4371 |
completes when the <command>hastctl status</command> command |
4371 |
completes when <command>hastctl status</command> |
| 4372 |
reports 0 bytes of <literal>dirty</literal> extents.</para> |
4372 |
reports 0 bytes of <literal>dirty</literal> extents.</para> |
| 4373 |
|
4373 |
|
| 4374 |
|
4374 |
|
| 4375 |
<para>The last step is to create a filesystem on the |
4375 |
<para>The next step is to create a filesystem on the |
| 4376 |
<devicename>/dev/hast/<replaceable>test</replaceable></devicename> |
4376 |
<devicename>/dev/hast/<replaceable>test</replaceable></devicename> |
| 4377 |
GEOM provider and mount it. This has to be done on the |
4377 |
GEOM provider and mount it. This must be done on the |
| 4378 |
<literal>primary</literal> node (as the |
4378 |
<literal>primary</literal> node, as |
| 4379 |
<filename>/dev/hast/<replaceable>test</replaceable></filename> |
4379 |
<filename>/dev/hast/<replaceable>test</replaceable></filename> |
| 4380 |
appears only on the <literal>primary</literal> node), and |
4380 |
appears only on the <literal>primary</literal> node. |
| 4381 |
it can take a few minutes depending on the size of the hard |
4381 |
Creating the filesystem can take a few minutes, depending on the |
| 4382 |
drive:</para> |
4382 |
size of the hard drive:</para> |
| 4383 |
|
4383 |
|
| 4384 |
<screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput> |
4384 |
<screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput> |
| 4385 |
&prompt.root; <userinput>mkdir /hast/test</userinput> |
4385 |
&prompt.root; <userinput>mkdir /hast/test</userinput> |
|
Lines 4387-4395
Link Here
|
| 4387 |
|
4387 |
|
| 4388 |
<para>Once the <acronym>HAST</acronym> framework is configured |
4388 |
<para>Once the <acronym>HAST</acronym> framework is configured |
| 4389 |
properly, the final step is to make sure that |
4389 |
properly, the final step is to make sure that |
| 4390 |
<acronym>HAST</acronym> is started during the system boot time |
4390 |
<acronym>HAST</acronym> is started automatically during the system |
| 4391 |
automatically. The following line should be added to the |
4391 |
boot. Add this line to |
| 4392 |
<filename>/etc/rc.conf</filename> file:</para> |
4392 |
<filename>/etc/rc.conf</filename>:</para> |
| 4393 |
|
4393 |
|
| 4394 |
<programlisting>hastd_enable="YES"</programlisting> |
4394 |
<programlisting>hastd_enable="YES"</programlisting> |
| 4395 |
|
4395 |
|
|
Lines 4397-4422
Link Here
|
| 4397 |
<title>Failover Configuration</title> |
4397 |
<title>Failover Configuration</title> |
| 4398 |
|
4398 |
|
| 4399 |
<para>The goal of this example is to build a robust storage |
4399 |
<para>The goal of this example is to build a robust storage |
| 4400 |
system which is resistant from the failures of any given node. |
4400 |
system which is resistant to the failure of any given node. |
| 4401 |
The key task here is to remedy a scenario when a |
4401 |
The scenario is that a |
| 4402 |
<literal>primary</literal> node of the cluster fails. Should |
4402 |
<literal>primary</literal> node of the cluster fails. If |
| 4403 |
it happen, the <literal>secondary</literal> node is there to |
4403 |
this happens, the <literal>secondary</literal> node is there to |
| 4404 |
take over seamlessly, check and mount the file system, and |
4404 |
take over seamlessly, check and mount the file system, and |
| 4405 |
continue to work without missing a single bit of data.</para> |
4405 |
continue to work without missing a single bit of data.</para> |
| 4406 |
|
4406 |
|
| 4407 |
<para>In order to accomplish this task, it will be required to |
4407 |
<para>To accomplish this task, another &os; feature provides |
| 4408 |
utilize another feature available under &os; which provides |
|
|
| 4409 |
for automatic failover on the IP layer — |
4408 |
for automatic failover on the IP layer — |
| 4410 |
<acronym>CARP</acronym>. <acronym>CARP</acronym> stands for |
4409 |
<acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address |
| 4411 |
Common Address Redundancy Protocol and allows multiple hosts |
4410 |
Redundancy Protocol) allows multiple hosts |
| 4412 |
on the same network segment to share an IP address. Set up |
4411 |
on the same network segment to share an IP address. Set up |
| 4413 |
<acronym>CARP</acronym> on both nodes of the cluster according |
4412 |
<acronym>CARP</acronym> on both nodes of the cluster according |
| 4414 |
to the documentation available in <xref linkend="carp">. |
4413 |
to the documentation available in <xref linkend="carp">. |
| 4415 |
After completing this task, each node should have its own |
4414 |
After setup, each node will have its own |
| 4416 |
<devicename>carp0</devicename> interface with a shared IP |
4415 |
<devicename>carp0</devicename> interface with a shared IP |
| 4417 |
address <replaceable>172.16.0.254</replaceable>. |
4416 |
address <replaceable>172.16.0.254</replaceable>. |
| 4418 |
Obviously, the primary <acronym>HAST</acronym> node of the |
4417 |
The primary <acronym>HAST</acronym> node of the |
| 4419 |
cluster has to be the master <acronym>CARP</acronym> |
4418 |
cluster must be the master <acronym>CARP</acronym> |
| 4420 |
node.</para> |
4419 |
node.</para> |
| 4421 |
|
4420 |
|
| 4422 |
<para>The <acronym>HAST</acronym> pool created in the previous |
4421 |
<para>The <acronym>HAST</acronym> pool created in the previous |
|
Lines 4430-4446
Link Here
|
| 4430 |
|
4429 |
|
| 4431 |
<para>In the event of <acronym>CARP</acronym> interfaces going |
4430 |
<para>In the event of <acronym>CARP</acronym> interfaces going |
| 4432 |
up or down, the &os; operating system generates a &man.devd.8; |
4431 |
up or down, the &os; operating system generates a &man.devd.8; |
| 4433 |
event, which makes it possible to watch for the state changes |
4432 |
event, making it possible to watch for the state changes |
| 4434 |
on the <acronym>CARP</acronym> interfaces. A state change on |
4433 |
on the <acronym>CARP</acronym> interfaces. A state change on |
| 4435 |
the <acronym>CARP</acronym> interface is an indication that |
4434 |
the <acronym>CARP</acronym> interface is an indication that |
| 4436 |
one of the nodes failed or came back online. In such a case, |
4435 |
one of the nodes failed or came back online. These state change |
| 4437 |
it is possible to run a particular script which will |
4436 |
events make it possible to run a script which will |
| 4438 |
automatically handle the failover.</para> |
4437 |
automatically handle the HAST failover.</para> |
| 4439 |
|
4438 |
|
| 4440 |
<para>To be able to catch the state changes on the |
4439 |
<para>To be able to catch state changes on the |
| 4441 |
<acronym>CARP</acronym> interfaces, the following |
4440 |
<acronym>CARP</acronym> interfaces, add this |
| 4442 |
configuration has to be added to the |
4441 |
configuration to |
| 4443 |
<filename>/etc/devd.conf</filename> file on each node:</para> |
4442 |
<filename>/etc/devd.conf</filename> on each node:</para> |
| 4444 |
|
4443 |
|
| 4445 |
<programlisting>notify 30 { |
4444 |
<programlisting>notify 30 { |
| 4446 |
match "system" "IFNET"; |
4445 |
match "system" "IFNET"; |
|
Lines 4456-4467
Link Here
|
| 4456 |
action "/usr/local/sbin/carp-hast-switch slave"; |
4455 |
action "/usr/local/sbin/carp-hast-switch slave"; |
| 4457 |
};</programlisting> |
4456 |
};</programlisting> |
| 4458 |
|
4457 |
|
| 4459 |
<para>To put the new configuration into effect, run the |
4458 |
<para>Restart &man.devd.8; on both nodes to put the new configuration |
| 4460 |
following command on both nodes:</para> |
4459 |
into effect:</para> |
| 4461 |
|
4460 |
|
| 4462 |
<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen> |
4461 |
<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen> |
| 4463 |
|
4462 |
|
| 4464 |
<para>In the event that the <devicename>carp0</devicename> |
4463 |
<para>When the <devicename>carp0</devicename> |
| 4465 |
interface goes up or down (i.e. the interface state changes), |
4464 |
interface goes up or down (i.e. the interface state changes), |
| 4466 |
the system generates a notification, allowing the &man.devd.8; |
4465 |
the system generates a notification, allowing the &man.devd.8; |
| 4467 |
subsystem to run an arbitrary script, in this case |
4466 |
subsystem to run an arbitrary script, in this case |
|
Lines 4471-4477
Link Here
|
| 4471 |
&man.devd.8; configuration, please consult the |
4470 |
&man.devd.8; configuration, please consult the |
| 4472 |
&man.devd.conf.5; manual page.</para> |
4471 |
&man.devd.conf.5; manual page.</para> |
| 4473 |
|
4472 |
|
| 4474 |
<para>An example of such a script could be following:</para> |
4473 |
<para>An example of such a script could be:</para> |
| 4475 |
|
4474 |
|
| 4476 |
<programlisting>#!/bin/sh |
4475 |
<programlisting>#!/bin/sh |
| 4477 |
|
4476 |
|
|
Lines 4557-4569
Link Here
|
| 4557 |
;; |
4556 |
;; |
| 4558 |
esac</programlisting> |
4557 |
esac</programlisting> |
| 4559 |
|
4558 |
|
| 4560 |
<para>In a nutshell, the script does the following when a node |
4559 |
<para>In a nutshell, the script takes these actions when a node |
| 4561 |
becomes <literal>master</literal> / |
4560 |
becomes <literal>master</literal> / |
| 4562 |
<literal>primary</literal>:</para> |
4561 |
<literal>primary</literal>:</para> |
| 4563 |
|
4562 |
|
| 4564 |
<itemizedlist> |
4563 |
<itemizedlist> |
| 4565 |
<listitem> |
4564 |
<listitem> |
| 4566 |
<para>Promotes the <acronym>HAST</acronym> pools as |
4565 |
<para>Promotes the <acronym>HAST</acronym> pools to |
| 4567 |
primary on a given node.</para> |
4566 |
primary on a given node.</para> |
| 4568 |
</listitem> |
4567 |
</listitem> |
| 4569 |
<listitem> |
4568 |
<listitem> |
|
Lines 4571-4577
Link Here
|
| 4571 |
<acronym>HAST</acronym> pool.</para> |
4570 |
<acronym>HAST</acronym> pool.</para> |
| 4572 |
</listitem> |
4571 |
</listitem> |
| 4573 |
<listitem> |
4572 |
<listitem> |
| 4574 |
<para>Mounts the pools at appropriate place.</para> |
4573 |
<para>Mounts the pools at an appropriate place.</para> |
| 4575 |
</listitem> |
4574 |
</listitem> |
| 4576 |
</itemizedlist> |
4575 |
</itemizedlist> |
| 4577 |
|
4576 |
|
|
Lines 4590-4604
Link Here
|
| 4590 |
|
4589 |
|
| 4591 |
<caution> |
4590 |
<caution> |
| 4592 |
<para>Keep in mind that this is just an example script which |
4591 |
<para>Keep in mind that this is just an example script which |
| 4593 |
should serve as a proof of concept solution. It does not |
4592 |
should serve as a proof of concept. It does not |
| 4594 |
handle all the possible scenarios and can be extended or |
4593 |
handle all the possible scenarios and can be extended or |
| 4595 |
altered in any way, for example it can start/stop required |
4594 |
altered in any way, for example it can start/stop required |
| 4596 |
services etc.</para> |
4595 |
services, etc.</para> |
| 4597 |
</caution> |
4596 |
</caution> |
| 4598 |
|
4597 |
|
| 4599 |
<tip> |
4598 |
<tip> |
| 4600 |
<para>For the purpose of this example we used a standard UFS |
4599 |
<para>For this example, we used a standard UFS |
| 4601 |
file system. In order to reduce the time needed for |
4600 |
file system. To reduce the time needed for |
| 4602 |
recovery, a journal-enabled UFS or ZFS file system can |
4601 |
recovery, a journal-enabled UFS or ZFS file system can |
| 4603 |
be used.</para> |
4602 |
be used.</para> |
| 4604 |
</tip> |
4603 |
</tip> |
|
Lines 4615-4655
Link Here
|
| 4615 |
<sect3> |
4614 |
<sect3> |
| 4616 |
<title>General Troubleshooting Tips</title> |
4615 |
<title>General Troubleshooting Tips</title> |
| 4617 |
|
4616 |
|
| 4618 |
<para><acronym>HAST</acronym> should be generally working |
4617 |
<para><acronym>HAST</acronym> should generally work |
| 4619 |
without any issues, however as with any other software |
4618 |
without issues. However, as with any other software |
| 4620 |
product, there may be times when it does not work as |
4619 |
product, there may be times when it does not work as |
| 4621 |
supposed. The sources of the problems may be different, but |
4620 |
supposed. The sources of the problems may be different, but |
| 4622 |
the rule of thumb is to ensure that the time is synchronized |
4621 |
the rule of thumb is to ensure that the time is synchronized |
| 4623 |
between all nodes of the cluster.</para> |
4622 |
between all nodes of the cluster.</para> |
| 4624 |
|
4623 |
|
| 4625 |
<para>The debugging level of the &man.hastd.8; should be |
4624 |
<para>When troubleshooting <acronym>HAST</acronym> problems, |
| 4626 |
increased when troubleshooting <acronym>HAST</acronym> |
4625 |
the debugging level of &man.hastd.8; should be increased |
| 4627 |
problems. This can be accomplished by starting the |
4626 |
by starting the |
| 4628 |
&man.hastd.8; daemon with the <literal>-d</literal> |
4627 |
&man.hastd.8; daemon with the <literal>-d</literal> |
| 4629 |
argument. Note, that this argument may be specified |
4628 |
argument. Note that this argument may be specified |
| 4630 |
multiple times to further increase the debugging level. A |
4629 |
multiple times to further increase the debugging level. A |
| 4631 |
lot of useful information may be obtained this way. It |
4630 |
lot of useful information may be obtained this way. Consider |
| 4632 |
should be also considered to use <literal>-F</literal> |
4631 |
also using the <literal>-F</literal> |
| 4633 |
argument, which will start the &man.hastd.8; daemon in |
4632 |
argument, which starts the &man.hastd.8; daemon in the |
| 4634 |
foreground.</para> |
4633 |
foreground.</para> |
| 4635 |
</sect3> |
4634 |
</sect3> |
| 4636 |
|
4635 |
|
| 4637 |
<sect3 id="disks-hast-sb"> |
4636 |
<sect3 id="disks-hast-sb"> |
| 4638 |
<title>Recovering from the Split-brain Condition</title> |
4637 |
<title>Recovering from the Split-brain Condition</title> |
| 4639 |
|
4638 |
|
| 4640 |
<para>The consequence of a situation when both nodes of the |
4639 |
<para><literal>Split-brain</literal> is when the nodes of the |
| 4641 |
cluster are not able to communicate with each other and both |
4640 |
cluster are unable to communicate with each other, and both |
| 4642 |
are configured as primary nodes is called |
4641 |
are configured as primary. This is a dangerous |
| 4643 |
<literal>split-brain</literal>. This is a dangerous |
|
|
| 4644 |
condition because it allows both nodes to make incompatible |
4642 |
condition because it allows both nodes to make incompatible |
| 4645 |
changes to the data. This situation has to be handled by |
4643 |
changes to the data. This problem must be corrected |
| 4646 |
the system administrator manually.</para> |
4644 |
manually by the system administrator.</para> |
| 4647 |
|
4645 |
|
| 4648 |
<para>In order to fix this situation the administrator has to |
4646 |
<para>The administrator must |
| 4649 |
decide which node has more important changes (or merge them |
4647 |
decide which node has more important changes (or merge them |
| 4650 |
manually) and let the <acronym>HAST</acronym> perform |
4648 |
manually) and let <acronym>HAST</acronym> perform |
| 4651 |
the full synchronization of the node which has the broken |
4649 |
full synchronization of the node which has the broken |
| 4652 |
data. To do this, issue the following commands on the node |
4650 |
data. To do this, issue these commands on the node |
| 4653 |
which needs to be resynchronized:</para> |
4651 |
which needs to be resynchronized:</para> |
| 4654 |
|
4652 |
|
| 4655 |
<screen>&prompt.root; <userinput>hastctl role init <resource></userinput> |
4653 |
<screen>&prompt.root; <userinput>hastctl role init <resource></userinput> |