FreeBSD Bugzilla – Attachment 117764 Details for
Bug 159897
[handbook] [patch] improve HAST section of Handbook
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Help
|
New Account
|
Log In
Remember
[x]
|
Forgot Password
Login:
[x]
[patch]
file.diff
file.diff (text/plain), 14.96 KB, created by
Warren Block
on 2011-08-19 00:00:21 UTC
(
hide
)
Description:
file.diff
Filename:
MIME Type:
Creator:
Warren Block
Created:
2011-08-19 00:00:21 UTC
Size:
14.96 KB
patch
obsolete
>--- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig 2011-08-18 15:22:56.000000000 -0600 >+++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml 2011-08-18 16:35:46.000000000 -0600 >@@ -4038,7 +4038,7 @@ > <sect2> > <title>Synopsis</title> > >- <para>High-availability is one of the main requirements in serious >+ <para>High availability is one of the main requirements in serious > business applications and highly-available storage is a key > component in such environments. Highly Available STorage, or > <acronym>HAST<remark role="acronym">Highly Available >@@ -4109,7 +4109,7 @@ > drives.</para> > </listitem> > <listitem> >- <para>File system agnostic, thus allowing to use any file >+ <para>File system agnostic, thus allowing use of any file > system supported by &os;.</para> > </listitem> > <listitem> >@@ -4152,7 +4152,7 @@ > total.</para> > </note> > >- <para>Since the <acronym>HAST</acronym> works in >+ <para>Since <acronym>HAST</acronym> works in > primary-secondary configuration, it allows only one of the > cluster nodes to be active at any given time. The > <literal>primary</literal> node, also called >@@ -4175,7 +4175,7 @@ > </itemizedlist> > > <para><acronym>HAST</acronym> operates synchronously on a block >- level, which makes it transparent for file systems and >+ level, making it transparent to file systems and > applications. <acronym>HAST</acronym> provides regular GEOM > providers in <filename class="directory">/dev/hast/</filename> > directory for use by other tools or applications, thus there is >@@ -4252,7 +4252,7 @@ > For stripped-down systems, make sure this module is available. > Alternatively, it is possible to build > <literal>GEOM_GATE</literal> support into the kernel >- statically, by adding the following line to the custom kernel >+ statically, by adding this line to the custom kernel > configuration file:</para> > > <programlisting>options GEOM_GATE</programlisting> >@@ -4290,10 +4290,10 @@ > class="directory">/dev/hast/</filename>) will be called > <filename><replaceable>test</replaceable></filename>.</para> > >- <para>The configuration of <acronym>HAST</acronym> is being done >+ <para>Configuration of <acronym>HAST</acronym> is done > in the <filename>/etc/hast.conf</filename> file. This file > should be the same on both nodes. The simplest configuration >- possible is following:</para> >+ possible is:</para> > > <programlisting>resource test { > on hasta { >@@ -4317,9 +4317,9 @@ > alternatively in the local <acronym>DNS</acronym>.</para> > </tip> > >- <para>Now that the configuration exists on both nodes, it is >- possible to create the <acronym>HAST</acronym> pool. Run the >- following commands on both nodes to place the initial metadata >+ <para>Now that the configuration exists on both nodes, >+ the <acronym>HAST</acronym> pool can be created. Run these >+ commands on both nodes to place the initial metadata > onto the local disk, and start the &man.hastd.8; daemon:</para> > > <screen>&prompt.root; <userinput>hastctl create test</userinput> >@@ -4334,51 +4334,51 @@ > available.</para> > </note> > >- <para>HAST is not responsible for selecting node's role >- (<literal>primary</literal> or <literal>secondary</literal>). >- Node's role has to be configured by an administrator or other >- software like <application>Heartbeat</application> using the >+ <para>A HAST node's role (<literal>primary</literal> or >+ <literal>secondary</literal>) is selected by an administrator >+ or other >+ software like <application>Heartbeat</application> using the > &man.hastctl.8; utility. Move to the primary node > (<literal><replaceable>hasta</replaceable></literal>) and >- issue the following command:</para> >+ issue this command:</para> > > <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen> > >- <para>Similarly, run the following command on the secondary node >+ <para>Similarly, run this command on the secondary node > (<literal><replaceable>hastb</replaceable></literal>):</para> > > <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen> > > <caution> >- <para>It may happen that both of the nodes are not able to >- communicate with each other and both are configured as >- primary nodes; the consequence of this condition is called >- <literal>split-brain</literal>. In order to troubleshoot >+ <para>When the nodes are unable to >+ communicate with each other, and both are configured as >+ primary nodes, the condition is called >+ <literal>split-brain</literal>. To troubleshoot > this situation, follow the steps described in <xref > linkend="disks-hast-sb">.</para> > </caution> > >- <para>It is possible to verify the result with the >+ <para>Verify the result with the > &man.hastctl.8; utility on each node:</para> > > <screen>&prompt.root; <userinput>hastctl status test</userinput></screen> > >- <para>The important text is the <literal>status</literal> line >- from its output and it should say <literal>complete</literal> >+ <para>The important text is the <literal>status</literal> line, >+ which should say <literal>complete</literal> > on each of the nodes. If it says <literal>degraded</literal>, > something went wrong. At this point, the synchronization > between the nodes has already started. The synchronization >- completes when the <command>hastctl status</command> command >+ completes when <command>hastctl status</command> > reports 0 bytes of <literal>dirty</literal> extents.</para> > > >- <para>The last step is to create a filesystem on the >+ <para>The next step is to create a filesystem on the > <devicename>/dev/hast/<replaceable>test</replaceable></devicename> >- GEOM provider and mount it. This has to be done on the >- <literal>primary</literal> node (as the >+ GEOM provider and mount it. This must be done on the >+ <literal>primary</literal> node, as > <filename>/dev/hast/<replaceable>test</replaceable></filename> >- appears only on the <literal>primary</literal> node), and >- it can take a few minutes depending on the size of the hard >+ appears only on the <literal>primary</literal> node. >+ It can take a few minutes depending on the size of the hard > drive:</para> > > <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput> >@@ -4387,9 +4387,9 @@ > > <para>Once the <acronym>HAST</acronym> framework is configured > properly, the final step is to make sure that >- <acronym>HAST</acronym> is started during the system boot time >- automatically. The following line should be added to the >- <filename>/etc/rc.conf</filename> file:</para> >+ <acronym>HAST</acronym> is started automatically during the system >+ boot. This line is added to >+ <filename>/etc/rc.conf</filename>:</para> > > <programlisting>hastd_enable="YES"</programlisting> > >@@ -4397,26 +4397,25 @@ > <title>Failover Configuration</title> > > <para>The goal of this example is to build a robust storage >- system which is resistant from the failures of any given node. >- The key task here is to remedy a scenario when a >- <literal>primary</literal> node of the cluster fails. Should >- it happen, the <literal>secondary</literal> node is there to >+ system which is resistant to failures of any given node. >+ The scenario is that a >+ <literal>primary</literal> node of the cluster fails. If >+ this happens, the <literal>secondary</literal> node is there to > take over seamlessly, check and mount the file system, and > continue to work without missing a single bit of data.</para> > >- <para>In order to accomplish this task, it will be required to >- utilize another feature available under &os; which provides >+ <para>To accomplish this task, another &os; feature provides > for automatic failover on the IP layer — >- <acronym>CARP</acronym>. <acronym>CARP</acronym> stands for >- Common Address Redundancy Protocol and allows multiple hosts >+ <acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address >+ Redundancy Protocol) allows multiple hosts > on the same network segment to share an IP address. Set up > <acronym>CARP</acronym> on both nodes of the cluster according > to the documentation available in <xref linkend="carp">. >- After completing this task, each node should have its own >+ After setup, each node will have its own > <devicename>carp0</devicename> interface with a shared IP > address <replaceable>172.16.0.254</replaceable>. >- Obviously, the primary <acronym>HAST</acronym> node of the >- cluster has to be the master <acronym>CARP</acronym> >+ The primary <acronym>HAST</acronym> node of the >+ cluster must be the master <acronym>CARP</acronym> > node.</para> > > <para>The <acronym>HAST</acronym> pool created in the previous >@@ -4430,17 +4429,17 @@ > > <para>In the event of <acronym>CARP</acronym> interfaces going > up or down, the &os; operating system generates a &man.devd.8; >- event, which makes it possible to watch for the state changes >+ event, making it possible to watch for the state changes > on the <acronym>CARP</acronym> interfaces. A state change on > the <acronym>CARP</acronym> interface is an indication that >- one of the nodes failed or came back online. In such a case, >- it is possible to run a particular script which will >+ one of the nodes failed or came back online. These state change >+ events make it possible to run a script which will > automatically handle the failover.</para> > >- <para>To be able to catch the state changes on the >- <acronym>CARP</acronym> interfaces, the following >- configuration has to be added to the >- <filename>/etc/devd.conf</filename> file on each node:</para> >+ <para>To be able to catch state changes on the >+ <acronym>CARP</acronym> interfaces, add this >+ configuration to >+ <filename>/etc/devd.conf</filename> on each node:</para> > > <programlisting>notify 30 { > match "system" "IFNET"; >@@ -4456,12 +4455,12 @@ > action "/usr/local/sbin/carp-hast-switch slave"; > };</programlisting> > >- <para>To put the new configuration into effect, run the >- following command on both nodes:</para> >+ <para>Restart &man.devd.8; on both nodes o put the new configuration >+ into effect:</para> > > <screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen> > >- <para>In the event that the <devicename>carp0</devicename> >+ <para>When the <devicename>carp0</devicename> > interface goes up or down (i.e. the interface state changes), > the system generates a notification, allowing the &man.devd.8; > subsystem to run an arbitrary script, in this case >@@ -4471,7 +4470,7 @@ > &man.devd.8; configuration, please consult the > &man.devd.conf.5; manual page.</para> > >- <para>An example of such a script could be following:</para> >+ <para>An example of such a script could be:</para> > > <programlisting>#!/bin/sh > >@@ -4557,13 +4556,13 @@ > ;; > esac</programlisting> > >- <para>In a nutshell, the script does the following when a node >+ <para>In a nutshell, the script takes these actions when a node > becomes <literal>master</literal> / > <literal>primary</literal>:</para> > > <itemizedlist> > <listitem> >- <para>Promotes the <acronym>HAST</acronym> pools as >+ <para>Promotes the <acronym>HAST</acronym> pools to > primary on a given node.</para> > </listitem> > <listitem> >@@ -4571,7 +4570,7 @@ > <acronym>HAST</acronym> pool.</para> > </listitem> > <listitem> >- <para>Mounts the pools at appropriate place.</para> >+ <para>Mounts the pools at an appropriate place.</para> > </listitem> > </itemizedlist> > >@@ -4590,15 +4589,15 @@ > > <caution> > <para>Keep in mind that this is just an example script which >- should serve as a proof of concept solution. It does not >+ should serve as a proof of concept. It does not > handle all the possible scenarios and can be extended or > altered in any way, for example it can start/stop required >- services etc.</para> >+ services, etc.</para> > </caution> > > <tip> >- <para>For the purpose of this example we used a standard UFS >- file system. In order to reduce the time needed for >+ <para>For this example, we used a standard UFS >+ file system. To reduce the time needed for > recovery, a journal-enabled UFS or ZFS file system can > be used.</para> > </tip> >@@ -4615,41 +4614,40 @@ > <sect3> > <title>General Troubleshooting Tips</title> > >- <para><acronym>HAST</acronym> should be generally working >- without any issues, however as with any other software >+ <para><acronym>HAST</acronym> should generally work >+ without issues. However, as with any other software > product, there may be times when it does not work as > supposed. The sources of the problems may be different, but > the rule of thumb is to ensure that the time is synchronized > between all nodes of the cluster.</para> > >- <para>The debugging level of the &man.hastd.8; should be >- increased when troubleshooting <acronym>HAST</acronym> >- problems. This can be accomplished by starting the >+ <para>When troubleshooting <acronym>HAST</acronym> problems, >+ the debugging level of &man.hastd.8; should be increased >+ by starting the > &man.hastd.8; daemon with the <literal>-d</literal> >- argument. Note, that this argument may be specified >+ argument. Note that this argument may be specified > multiple times to further increase the debugging level. A >- lot of useful information may be obtained this way. It >- should be also considered to use <literal>-F</literal> >- argument, which will start the &man.hastd.8; daemon in >+ lot of useful information may be obtained this way. Consider >+ also using the <literal>-F</literal> >+ argument, which starts the &man.hastd.8; daemon in the > foreground.</para> > </sect3> > > <sect3 id="disks-hast-sb"> > <title>Recovering from the Split-brain Condition</title> > >- <para>The consequence of a situation when both nodes of the >- cluster are not able to communicate with each other and both >- are configured as primary nodes is called >- <literal>split-brain</literal>. This is a dangerous >+ <para><literal>Split-brain</literal> is when the nodes of the >+ cluster are unable to communicate with each other, and both >+ are configured as primary. This is a dangerous > condition because it allows both nodes to make incompatible >- changes to the data. This situation has to be handled by >- the system administrator manually.</para> >+ changes to the data. This problem must be corrected >+ manually by the system administrator.</para> > >- <para>In order to fix this situation the administrator has to >+ <para>The administrator must > decide which node has more important changes (or merge them >- manually) and let the <acronym>HAST</acronym> perform >+ manually) and let <acronym>HAST</acronym> perform > the full synchronization of the node which has the broken >- data. To do this, issue the following commands on the node >+ data. To do this, issue these commands on the node > which needs to be resynchronized:</para> > > <screen>&prompt.root; <userinput>hastctl role init <resource></userinput>
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Diff
Attachments on
bug 159897
: 117764 |
117765
|
117766
|
117767