Bug 191173 - [xen] Poor file write performance on xen blockfront
Summary: [xen] Poor file write performance on xen blockfront
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: Roger Pau Monné
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-18 21:01 UTC by Toby Karyadi
Modified: 2014-08-18 08:51 UTC (History)
2 users (show)

See Also:


Attachments
Patch to add unmapped IO support to blkfront (2.76 KB, patch)
2014-07-04 12:58 UTC, Roger Pau Monné
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Toby Karyadi 2014-06-18 21:01:46 UTC
This bug report merely confirms the thorough benchmark done by Sydney Meyer summarized in his conversation with Roger Pau Monne here: http://lists.freebsd.org/pipermail/freebsd-virtualization/2014-January/001951.html

In my case the dom0 is NetBSD 6.1.4 with Xen 4.2. I noticed a few things that seems to be consistent with Sydney's observations:
- With FBSD 10, dd-ing from /dev/zero into a file is at least 50% slower than dd-ing into a disk. In my case dd to file was ~25MB/sec at best, while dd to disk was ~60MB/sec.
- With NetBSD domU run under HVM, dd-ing into a file was ~58MB/sec, dd-ing unto a partition was ~35MB/sec. Under NetBSD domU PV, it was 125+MB/sec
- In the NetBSD dom0, dd-ing into a file was ~100MB/sec.

In all case, each of the domU disks were backed by an LVM logical volume. 

NetBSD has/had some issue with poor domU disk write performance and basically, AFAIU, it's due to improper reordering of write requests. This email contains a patch that solved the problem (for NetBSD domU only, obviously), but you can read the whole thread. The NetBSD domU kernel that I use has this patch. I'm not a kernel programmer, so most of this stuff goes beyond my head, however, maybe the FreeBSD blockfront is suffering the same issue?

Building an FBSD 10 kernel without PVHVM may give more data points, especially if the disk performs faster with the standard ide/sata/scsi driver (as is also shown in Sydney's email with FBSD 9.2). I just don't have the time ATM.

Btw, the quickest way to repro the problem is to just use the installation cd, as you can get into the shell and mess around.

DETAILS:
* How the domU disk was setup:
# gpart create ada0
# # Now boot part is created at offset 128k with size 128k. I suppose you can
# # skip this for testing purposes, but it's here for illustration.
# # In my case it is important because the domO disk is a raidframe
# # (software raid) raid5 with 32k blocks, so it's best to start partition 
# # at multiple of 32k to avoid excessive read-modify-write.
# gpart add -b 128k -s 128k -t freebsd-boot ada0
# gpart bootcode -p /boot/gptboot -i 1 ada0
# # Below is creation of ada0p2
# gpart add -b 1M -s 2047M -t freebsd-ufs ada0
# newfs -b 32k /dev/ada0p2

* How the dd into file was done
# mount /dev/ada0p2 /mnt
# cd /mnt
# # Now write 1GB
# dd if=/dev/zero of=big.img bs=32k count=$((1024*32))

* How the dd into disk was done
# # Btw, I have a second disk /dev/ada1. I haven't tried this, but creating
# # another partition might show the performance difference as well.
# dd if=/dev/zero of=/dev/ada1 bs=32k count=$((1024*32))

Toby
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2014-06-26 01:57:23 UTC
Over to maintainers.
Comment 2 Mark Felder freebsd_committer freebsd_triage 2014-06-26 20:34:16 UTC
Please detail in the bug report the hardware in question. I've also seen what I believe could be similar behavior on FreeBSD 10 VMs, but only on one cluster with older CPUs.

The more data points we have the more likely this can be narrowed down.

Thanks!
Comment 3 Roger Pau Monné freebsd_committer freebsd_triage 2014-07-04 12:58:06 UTC
Created attachment 144403 [details]
Patch to add unmapped IO support to blkfront

Hello,

I've been doing some tests in order to try to figure out what's going on. the workload I've used as a benchmark is a buildkernel with -j12 on a 8 vCPU FreeBSD guest. Here are the results:

PVHVM
     3165.84 real      6354.17 user      4483.32 sys
HVM
     2494.73 real      4975.33 user      3344.52 sys
PVHVM with emulated disk
     2193.31 real      4619.76 user      3028.79 sys
PVHVM with unmapped IO
     2099.46 real      4624.52 user      2967.38 sys

I'm attaching a patch that allows blkfront to use unmapped IO, which makes a difference specially on systems without HAP. This is because tlb flushes and range invalidations are much more expensive when using shadow page tables. This also avoids sending IPIs to every CPU, which is also expensive on virtualized environments.

Roger.
Comment 4 Bartek Rutkowski freebsd_committer freebsd_triage 2014-07-30 12:06:51 UTC
Below are my tests done the same way as Roger's, that is make -j12 for buildworld and buildkernel, on a 8 vcpu PVHVM FreeBSD 11-C (src rev 269300) hosted on XenServer 6.2 patched with latest patches:

Emulated disk:
--------------------------------------------------------------
>>> World build completed on Wed Jul 30 13:00:26 CEST 2014
--------------------------------------------------------------
     1933.84 real     11310.08 user      1550.49 sys

--------------------------------------------------------------
>>> Kernel build for POUDRIERE completed on Wed Jul 30 13:10:03 CEST 2014
--------------------------------------------------------------
      335.96 real      1615.87 user       287.34 sys


Patched unmaped IO:
--------------------------------------------------------------
>>> World build completed on Wed Jul 30 13:54:34 CEST 2014
--------------------------------------------------------------
     1858.90 real     11145.81 user      1221.92 sys

--------------------------------------------------------------
>>> Kernel build for POUDRIERE completed on Wed Jul 30 14:01:17 CEST 2014
--------------------------------------------------------------
      319.78 real      1602.64 user       234.32 sys


This goes in line with Roger's results and I've feeling the OS was a bit more responsive during heaviest loads, but said that, this is something I dont know how to measure accurately.
Comment 5 commit-hook freebsd_committer freebsd_triage 2014-08-11 15:37:14 UTC
A commit references this bug:

Author: royger
Date: Mon Aug 11 15:37:02 UTC 2014
New revision: 269814
URL: http://svnweb.freebsd.org/changeset/base/269814

Log:
  blkfront: add support for unmapped IO

  Using unmapped IO is really beneficial when running inside of a VM,
  since it avoids IPIs to other vCPUs in order to invalidate the
  mappings.

  This patch adds unmapped IO support to blkfront. The following tests
  results have been obtained when running on a Xen host without HAP:

  PVHVM
       3165.84 real      6354.17 user      4483.32 sys
  PVHVM with unmapped IO
       2099.46 real      4624.52 user      2967.38 sys

  This is because when running using shadow page tables TLB flushes and
  range invalidations are much more expensive, so using unmapped IO
  provides a very important performance boost.

  Sponsored by:	Citrix Systems R&D
  Tested by:	robak
  MFC after:	1 week
  PR:		191173

  dev/xen/blkfront/blkfront.c:
   - Add and announce support for unmapped IO.

Changes:
  head/sys/dev/xen/blkfront/blkfront.c
Comment 6 commit-hook freebsd_committer freebsd_triage 2014-08-18 08:50:54 UTC
A commit references this bug:

Author: royger
Date: Mon Aug 18 08:50:05 UTC 2014
New revision: 270130
URL: http://svnweb.freebsd.org/changeset/base/270130

Log:
  MFC r269814:

  blkfront: add support for unmapped IO

  Sponsored by:	Citrix Systems R&D
  Tested by:	robak
  PR:		191173

Changes:
  stable/10/sys/dev/xen/blkfront/blkfront.c