summaryrefslogtreecommitdiff
path: root/system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch
diff options
context:
space:
mode:
Diffstat (limited to 'system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch')
-rw-r--r--system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch378
1 files changed, 378 insertions, 0 deletions
diff --git a/system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch b/system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch
new file mode 100644
index 0000000000..ef390e2b13
--- /dev/null
+++ b/system/xen/xsa/xsa299-4.12-0007-x86-mm-Always-retain-a-general-ref-on-partial.patch
@@ -0,0 +1,378 @@
+From 51fe4e67d954649fcf103116be6206a769f0db1e Mon Sep 17 00:00:00 2001
+From: George Dunlap <george.dunlap@citrix.com>
+Date: Thu, 10 Oct 2019 17:57:49 +0100
+Subject: [PATCH 07/11] x86/mm: Always retain a general ref on partial
+
+In order to allow recursive pagetable promotions and demotions to be
+interrupted, Xen must keep track of the state of the sub-pages
+promoted or demoted. This is stored in two elements in the page struct:
+nr_entries_validated and partial_flags.
+
+The rule is that entries [0, nr_entries_validated) should always be
+validated and hold a general reference count. If partial_flags is
+zero, then [nr_entries_validated] is not validated and no reference
+count is held. If PTF_partial_set is set, then [nr_entries_validated]
+is partially validated.
+
+At the moment, a distinction is made between promotion and demotion
+with regard to whether the entry itself "holds" a general reference
+count: when entry promotion is interrupted (i.e., returns -ERESTART),
+the entry is not considered to hold a reference; when entry demotion
+is interrupted, the entry is still considered to hold a general
+reference.
+
+PTF_partial_general_ref is used to distinguish between these cases.
+If clear, it's a partial promotion => no general reference count held
+by the entry; if set, it's partial demotion, so a general reference
+count held. Because promotions and demotions can be interleaved, this
+value is passed to get_page_and_type_from_mfn and put_page_from_l*e,
+to be able to properly handle reference counts.
+
+Unfortunately, because a refcount is not held, it is possible to
+engineer a situation where PFT_partial_set is set but the page in
+question has been assigned to another domain. A sketch is provided in
+the appendix.
+
+Fix this by having the parent page table entry hold a general
+reference count whenever PFT_partial_set is set. (For clarity of
+change, keep two separate flags. These will be collapsed in a
+subsequent changeset.)
+
+This has two basic implications. On the put_page_from_lNe() side,
+this mean that the (partial_set && !partial_ref) case can never happen,
+and no longer needs to be special-cased.
+
+Secondly, because both flags are set together, there's no need to carry over
+existing bits from partial_pte.
+
+(NB there is still another issue with calling _put_page_type() on a
+page which had PGT_partial set; that will be handled in a subsequent
+patch.)
+
+On the get_page_and_type_from_mfn() side, we need to distinguish
+between callers which hold a reference on partial (i.e.,
+alloc_lN_table()), and those which do not (new_cr3, PIN_LN_TABLE, and
+so on): pass a flag if the type should be retained on interruption.
+
+NB that since l1 promotion can't be preempted, that get_page_from_l2e
+can't return -ERESTART.
+
+This is part of XSA-299.
+
+Reported-by: George Dunlap <george.dunlap@citrix.com>
+Signed-off-by: George Dunlap <george.dunlap@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+-----
+* Appendix: Engineering PTF_partial_set while a page belongs to a
+ foreign domain
+
+Suppose A is a page which can be promoted to an l3, and B is a page
+which can be promoted to an l2, and A[x] points to B. B has
+PGC_allocated set but no other general references.
+
+V1: PIN_L3 A.
+ A is validated, B is validated.
+ A.type_count = 1 | PGT_validated | PGT_pinned
+ B.type_count = 1 | PGT_validated
+ B.count = 2 | PGC_allocated (A[x] holds a general ref)
+
+V1: UNPIN A.
+ A begins de-validation.
+ Arrange to be interrupted when i < x
+ V1->old_guest_table = A
+ V1->old_guest_table_ref_held = false
+ A.type_count = 1 | PGT_partial
+ A.nr_validated_entries = i < x
+ B.type_count = 0
+ B.count = 1 | PGC_allocated
+
+V2: MOD_L4_ENTRY to point some l4e to A.
+ Picks up re-validation of A.
+ Arrange to be interrupted halfway through B's validation
+ B.type_count = 1 | PGT_partial
+ B.count = 2 | PGC_allocated (PGT_partial holds a general ref)
+ A.type_count = 1 | PGT_partial
+ A.nr_validated_entries = x
+ A.partial_pte = PTF_partial_set
+
+V3: MOD_L3_ENTRY to point some other l3e (not in A) to B.
+ Validates B.
+ B.type_count = 1 | PGT_validated
+ B.count = 2 | PGC_allocated ("other l3e" holds a general ref)
+
+V3: MOD_L3_ENTRY to clear l3e pointing to B.
+ Devalidates B.
+ B.type_count = 0
+ B.count = 1 | PGC_allocated
+
+V3: decrease_reservation(B)
+ Clears PGC_allocated
+ B.count = 0 => B is freed
+
+B gets assigned to a different domain
+
+V1: Restarts UNPIN of A
+ put_old_guest_table(A)
+ ...
+ free_l3_table(A)
+
+Now since A.partial_flags has PTF_partial_set, free_l3_table() will
+call put_page_from_l3e() on A[x], which points to B, while B is owned
+by another domain.
+
+If A[x] held a general refcount for B on partial validation, as it does
+for partial de-validation, then B would still have a reference count of
+1 after PGC_allocated was freed; so B wouldn't be freed until after
+put_page_from_l3e() had happend on A[x].
+---
+ xen/arch/x86/mm.c | 84 +++++++++++++++++++++++-----------------
+ xen/include/asm-x86/mm.h | 15 ++++---
+ 2 files changed, 58 insertions(+), 41 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index bbd29a68f4..4d3ebf341d 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -1102,10 +1102,11 @@ get_page_from_l1e(
+ * page->pte[page->nr_validated_entries]. See the comment in mm.h for
+ * more information.
+ */
+-#define PTF_partial_set (1 << 0)
+-#define PTF_partial_general_ref (1 << 1)
+-#define PTF_preemptible (1 << 2)
+-#define PTF_defer (1 << 3)
++#define PTF_partial_set (1 << 0)
++#define PTF_partial_general_ref (1 << 1)
++#define PTF_preemptible (1 << 2)
++#define PTF_defer (1 << 3)
++#define PTF_retain_ref_on_restart (1 << 4)
+
+ static int get_page_and_type_from_mfn(
+ mfn_t mfn, unsigned long type, struct domain *d,
+@@ -1114,7 +1115,11 @@ static int get_page_and_type_from_mfn(
+ struct page_info *page = mfn_to_page(mfn);
+ int rc;
+ bool preemptible = flags & PTF_preemptible,
+- partial_ref = flags & PTF_partial_general_ref;
++ partial_ref = flags & PTF_partial_general_ref,
++ partial_set = flags & PTF_partial_set,
++ retain_ref = flags & PTF_retain_ref_on_restart;
++
++ ASSERT(partial_ref == partial_set);
+
+ if ( likely(!partial_ref) &&
+ unlikely(!get_page_from_mfn(mfn, d)) )
+@@ -1127,13 +1132,15 @@ static int get_page_and_type_from_mfn(
+ * - page is fully validated (rc == 0)
+ * - page is not validated (rc < 0) but:
+ * - We came in with a reference (partial_ref)
++ * - page is partially validated (rc == -ERESTART), and the
++ * caller has asked the ref to be retained in that case
+ * - page is partially validated but there's been an error
+ * (page == current->arch.old_guest_table)
+ *
+ * The partial_ref-on-error clause is worth an explanation. There
+ * are two scenarios where partial_ref might be true coming in:
+- * - mfn has been partially demoted as type `type`; i.e. has
+- * PGT_partial set
++ * - mfn has been partially promoted / demoted as type `type`;
++ * i.e. has PGT_partial set
+ * - mfn has been partially demoted as L(type+1) (i.e., a linear
+ * page; e.g. we're being called from get_page_from_l2e with
+ * type == PGT_l1_table, but the mfn is PGT_l2_table)
+@@ -1156,7 +1163,8 @@ static int get_page_and_type_from_mfn(
+ */
+ if ( likely(!rc) || partial_ref )
+ /* nothing */;
+- else if ( page == current->arch.old_guest_table )
++ else if ( page == current->arch.old_guest_table ||
++ (retain_ref && rc == -ERESTART) )
+ ASSERT(preemptible);
+ else
+ put_page(page);
+@@ -1354,8 +1362,8 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn,
+ if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
+ PTF_partial_set )
+ {
+- ASSERT(!(flags & PTF_defer));
+- rc = _put_page_type(pg, PTF_preemptible, ptpg);
++ /* partial_set should always imply partial_ref */
++ BUG();
+ }
+ else if ( flags & PTF_defer )
+ {
+@@ -1400,8 +1408,8 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+ if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
+ PTF_partial_set )
+ {
+- ASSERT(!(flags & PTF_defer));
+- return _put_page_type(pg, PTF_preemptible, mfn_to_page(_mfn(pfn)));
++ /* partial_set should always imply partial_ref */
++ BUG();
+ }
+
+ if ( flags & PTF_defer )
+@@ -1431,8 +1439,8 @@ static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+ if ( (flags & (PTF_partial_set | PTF_partial_general_ref)) ==
+ PTF_partial_set )
+ {
+- ASSERT(!(flags & PTF_defer));
+- return _put_page_type(pg, PTF_preemptible, mfn_to_page(_mfn(pfn)));
++ /* partial_set should always imply partial_ref */
++ BUG();
+ }
+
+ if ( flags & PTF_defer )
+@@ -1569,13 +1577,22 @@ static int alloc_l2_table(struct page_info *page, unsigned long type)
+ else
+ rc = get_page_from_l2e(l2e, pfn, d, partial_flags);
+
+- if ( rc == -ERESTART )
+- {
+- page->nr_validated_ptes = i;
+- /* Set 'set', retain 'general ref' */
+- page->partial_flags = partial_flags | PTF_partial_set;
+- }
+- else if ( rc == -EINTR && i )
++ /*
++ * It shouldn't be possible for get_page_from_l2e to return
++ * -ERESTART, since we never call this with PTF_preemptible.
++ * (alloc_l1_table may return -EINTR on an L1TF-vulnerable
++ * entry.)
++ *
++ * NB that while on a "clean" promotion, we can never get
++ * PGT_partial. It is possible to arrange for an l2e to
++ * contain a partially-devalidated l2; but in that case, both
++ * of the following functions will fail anyway (the first
++ * because the page in question is not an l1; the second
++ * because the page is not fully validated).
++ */
++ ASSERT(rc != -ERESTART);
++
++ if ( rc == -EINTR && i )
+ {
+ page->nr_validated_ptes = i;
+ page->partial_flags = 0;
+@@ -1584,6 +1601,7 @@ static int alloc_l2_table(struct page_info *page, unsigned long type)
+ else if ( rc < 0 && rc != -EINTR )
+ {
+ gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
++ ASSERT(current->arch.old_guest_table == NULL);
+ if ( i )
+ {
+ page->nr_validated_ptes = i;
+@@ -1642,7 +1660,7 @@ static int alloc_l3_table(struct page_info *page)
+ rc = get_page_and_type_from_mfn(
+ l3e_get_mfn(l3e),
+ PGT_l2_page_table | PGT_pae_xen_l2, d,
+- partial_flags | PTF_preemptible);
++ partial_flags | PTF_preemptible | PTF_retain_ref_on_restart);
+ }
+ else if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+ {
+@@ -1651,13 +1669,14 @@ static int alloc_l3_table(struct page_info *page)
+ rc = -EINTR;
+ }
+ else
+- rc = get_page_from_l3e(l3e, pfn, d, partial_flags);
++ rc = get_page_from_l3e(l3e, pfn, d,
++ partial_flags | PTF_retain_ref_on_restart);
+
+ if ( rc == -ERESTART )
+ {
+ page->nr_validated_ptes = i;
+ /* Set 'set', leave 'general ref' set if this entry was set */
+- page->partial_flags = partial_flags | PTF_partial_set;
++ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
+ }
+ else if ( rc == -EINTR && i )
+ {
+@@ -1833,13 +1852,14 @@ static int alloc_l4_table(struct page_info *page)
+ rc = -EINTR;
+ }
+ else
+- rc = get_page_from_l4e(l4e, pfn, d, partial_flags);
++ rc = get_page_from_l4e(l4e, pfn, d,
++ partial_flags | PTF_retain_ref_on_restart);
+
+ if ( rc == -ERESTART )
+ {
+ page->nr_validated_ptes = i;
+ /* Set 'set', leave 'general ref' set if this entry was set */
+- page->partial_flags = partial_flags | PTF_partial_set;
++ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
+ }
+ else if ( rc < 0 )
+ {
+@@ -1936,9 +1956,7 @@ static int free_l2_table(struct page_info *page)
+ else if ( rc == -ERESTART )
+ {
+ page->nr_validated_ptes = i;
+- page->partial_flags = (partial_flags & PTF_partial_set) ?
+- partial_flags :
+- (PTF_partial_set | PTF_partial_general_ref);
++ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
+ }
+ else if ( rc == -EINTR && i < L2_PAGETABLE_ENTRIES - 1 )
+ {
+@@ -1986,9 +2004,7 @@ static int free_l3_table(struct page_info *page)
+ if ( rc == -ERESTART )
+ {
+ page->nr_validated_ptes = i;
+- page->partial_flags = (partial_flags & PTF_partial_set) ?
+- partial_flags :
+- (PTF_partial_set | PTF_partial_general_ref);
++ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
+ }
+ else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
+ {
+@@ -2019,9 +2035,7 @@ static int free_l4_table(struct page_info *page)
+ if ( rc == -ERESTART )
+ {
+ page->nr_validated_ptes = i;
+- page->partial_flags = (partial_flags & PTF_partial_set) ?
+- partial_flags :
+- (PTF_partial_set | PTF_partial_general_ref);
++ page->partial_flags = PTF_partial_set | PTF_partial_general_ref;
+ }
+ else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
+ {
+diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
+index 8406ac3c37..02079e1324 100644
+--- a/xen/include/asm-x86/mm.h
++++ b/xen/include/asm-x86/mm.h
+@@ -238,22 +238,25 @@ struct page_info
+ * page.
+ *
+ * This happens:
+- * - During de-validation, if de-validation of the page was
++ * - During validation or de-validation, if the operation was
+ * interrupted
+ * - During validation, if an invalid entry is encountered and
+ * validation is preemptible
+ * - During validation, if PTF_partial_general_ref was set on
+- * this entry to begin with (perhaps because we're picking
+- * up from a partial de-validation).
++ * this entry to begin with (perhaps because it picked up a
++ * previous operation)
+ *
+- * When resuming validation, if PTF_partial_general_ref is clear,
+- * then a general reference must be re-acquired; if it is set, no
+- * reference should be acquired.
++ * When resuming validation, if PTF_partial_general_ref is
++ * clear, then a general reference must be re-acquired; if it
++ * is set, no reference should be acquired.
+ *
+ * When resuming de-validation, if PTF_partial_general_ref is
+ * clear, no reference should be dropped; if it is set, a
+ * reference should be dropped.
+ *
++ * NB at the moment, PTF_partial_set should be set if and only if
++ * PTF_partial_general_ref is set.
++ *
+ * NB that PTF_partial_set and PTF_partial_general_ref are
+ * defined in mm.c, the only place where they are used.
+ *
+--
+2.23.0
+