This patch set implements Cavium ThunderX SoC support. Patch set rebased on to 4.2-rc5 and overrides patch set v0 so please drop them before applying this one.
In case of any issues at build or run-time, please let me know.
Al Stone (2): Fix arm64 compilation error in PNP code clocksource: arm_arch_timer: fix system hang
Andre Przywara (14): KVM: arm/arm64: VGIC: don't track used LRs in the distributor KVM: arm/arm64: add emulation model specific destroy function KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities KVM: arm/arm64: make GIC frame address initialization model specific KVM: arm64: Introduce new MMIO region for the ITS base address KVM: arm64: handle ITS related GICv3 redistributor registers KVM: arm64: introduce ITS emulation file with stub functions KVM: arm64: implement basic ITS register handlers KVM: arm64: add data structures to model ITS interrupt translation KVM: arm64: handle pending bit for LPIs in ITS emulation KVM: arm64: sync LPI configuration and pending tables KVM: arm64: implement ITS command queue command handlers KVM: arm64: implement MSI injection in ITS emulation KVM: arm64: enable ITS emulation as a virtual MSI controller
Andrew Pinski (5): ARM64:VDSO: Improve gettimeofday, don't use udiv ARM64:VDSO: Improve __do_get_tspec, don't use udiv ARM64:Improve ticked spinlocks for high core count. ARM64:spinlocks: Fix up for WFE and improve performance slightly. ARM64: Improve copy_page for 128 cache line sizes.
Craig Magina (1): arm64: optimized copy_to_user and copy_from_user assembly code, part 2
David Daney (5): pci: Add is_pcierc element to struct pci_bus gic-its: Allow pci_requester_id to be overridden. arm64, pci: Allow RC drivers to supply pcibios_add_device() implementation. irqchip: gic-v3: Add gic_get_irq_domain() to get the irqdomain of the GIC. net/mlx4: Remove improper usage of dma_alloc_coherent().
Eric Auger (7): KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry KVM: irqchip: convey devid to kvm_set_msi KVM: arm/arm64: enable irqchip routing KVM: arm/arm64: build a default routing table KVM: arm/arm64: enable MSI routing KVM: arm: implement kvm_set_msi by gsi direct mapping
Feng Kan (1): arm64: optimized copy_to_user and copy_from_user assembly code
Graeme Gregory (3): Juno / net: smsc911x add support for probing from ACPI net: smc91x: add ACPI probing support. virtio-mmio: add ACPI probing
Naresh Bhat (1): mfd: vexpress-sysreg Add ACPI support for probing to driver
Narinder (1): Fixes to get ACPI based kernel booting. Temporary fix to get going.
Radha Mohan Chintakuntla (4): net: mdio-octeon: Modify driver to work on both ThunderX and Octeon net: mdio-octeon: Fix octeon_mdiobus_probe function for return values net: thunderx: Select CONFIG_MDIO_OCTEON for ThunderX NIC arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
Robert Richter (12): Revert "acpi, thuderx, pci: Add MCFG fixup." net: thunderx: Fixes for nicvf_set_rxfh() net: cavium: thunder_bgx/nic: Factor out DT specific code irqchip, gicv3-its: Read typer register outside the loop irqchip, gicv3: Add HW revision detection and configuration irqchip, gicv3: Implement Cavium ThunderX erratum 23154 irqchip, gicv3-its: Implement Cavium ThunderX errata 22375, 24313 arm64: gicv3: its: Add range check for number of allocated pages Revert "mfd: vexpress: Remove non-DT code" net: thunderx: acpi: Get mac address from acpi table acpi, property: Fix EXPORT_SYMBOL_GPL() for acpi_dev_prop_read_single() arm64: topology: Use acpi_disabled for ACPI check
Sunil Goutham (2): net: thunderx: Receive hashing HW offload support net: thunderx: Add receive error stats reporting via ethtool
TIRUMALESH CHALAMARLA (1): arm64: Increase the max granular size
Tirumalesh Chalamarla (3): PCI_ Add host drivers for Cavium ThunderX processors arm64: KVM: Enable minimalistic support for Thunder KVM: extend struct kvm_msi to hold a 32-bit device ID
Tomasz Nowicki (24): arm64, acpi: Implement new "GIC version" field of MADT GIC entry. ACPI, GICv3: Allow to map irq for non-hierarchical doamin. GICv3: Refactor gic_of_init() of GICv3 driver to allow for FDT and ACPI initialization. ACPI, GICV3+: Add support for GICv3+ initialization. GICv3, ITS: Isolate FDT related code, extract common functions. ACPI, GICv3, ITS: Add support for ACPI ITS binding. x86, acpi, pci: Reorder logic of pci_mmconfig_insert() function x86, acpi, pci: Move arch-agnostic MMCFG code out of arch/x86/ directory x86, acpi, pci: Move PCI config space accessors. x86, acpi, pci: mmconfig_{32,64}.c code refactoring - remove code duplication. x86, acpi, pci: mmconfig_64.c becomes default implementation for arch agnostic low-level direct PCI config space accessors via MMCONFIG. pci, acpi: Share ACPI PCI config space accessors. arm64, pci, acpi: Let ARM64 to use MMCONFIG PCI config space accessors. arm64, pci: Add PCI ACPI probing for ARM64 net, phy, apci: Allow to initialize Marvell phy in the ACPI way. net, mdio, acpi: Add support for ACPI binding. net, thunder, bgx: Rework driver to support ACPI binding. arm64/acpi/pci: provide hook for MCFG fixups acpi, property: Export acpi_dev_prop_read call to be usable for kernel modules. ARM64 / ACPI: Point KVM to the virtual timer interrupt when booting with ACPI arm64, acpi, pci: Omit OF related IRQ parsing when running with ACPI kernel. pci, acpi, dma: Unify coherency checking logic for PCI devices. ARM64, ACPI, PCI, MSI: I/O Remapping Table (IORT) initial support. Compiler bug workaround!!!
Vadim Lomovtsev (1): PCI: ThunderX: fix build issue
Documentation/virtual/kvm/api.txt | 46 +- Documentation/virtual/kvm/devices/arm-vgic.txt | 9 + arch/arm/include/asm/kvm_host.h | 4 +- arch/arm/kvm/Kconfig | 3 + arch/arm/kvm/Makefile | 2 +- arch/arm/kvm/arm.c | 2 +- arch/arm64/Kconfig | 4 + arch/arm64/include/asm/acpi.h | 2 + arch/arm64/include/asm/cache.h | 2 +- arch/arm64/include/asm/cputype.h | 3 + arch/arm64/include/asm/kvm_host.h | 3 +- arch/arm64/include/asm/pci.h | 47 + arch/arm64/include/asm/spinlock.h | 36 +- arch/arm64/include/uapi/asm/kvm.h | 5 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/acpi.c | 33 +- arch/arm64/kernel/pci-acpi.c | 362 +++++++ arch/arm64/kernel/pci.c | 35 +- arch/arm64/kernel/topology.c | 8 + arch/arm64/kernel/vdso/gettimeofday.S | 47 +- arch/arm64/kvm/Kconfig | 3 + arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/guest.c | 6 + arch/arm64/kvm/reset.c | 8 +- arch/arm64/kvm/sys_regs_generic_v8.c | 2 + arch/arm64/lib/copy_from_user.S | 87 +- arch/arm64/lib/copy_page.S | 32 + arch/arm64/lib/copy_template.S | 212 ++++ arch/arm64/lib/copy_to_user.S | 57 +- arch/x86/include/asm/pci.h | 42 + arch/x86/include/asm/pci_x86.h | 72 -- arch/x86/pci/Makefile | 5 +- arch/x86/pci/acpi.c | 1 + arch/x86/pci/init.c | 1 + arch/x86/pci/mmconfig-shared.c | 242 +---- arch/x86/pci/mmconfig_32.c | 11 +- arch/x86/pci/mmconfig_64.c | 153 --- drivers/acpi/Kconfig | 3 + drivers/acpi/Makefile | 2 + drivers/acpi/bus.c | 1 + drivers/acpi/iort.c | 272 +++++ drivers/acpi/mmconfig.c | 437 ++++++++ drivers/acpi/property.c | 1 + drivers/clocksource/arm_arch_timer.c | 9 +- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/qp.c | 2 +- drivers/infiniband/hw/mlx4/srq.c | 3 +- drivers/irqchip/Kconfig | 1 + drivers/irqchip/irq-gic-common.c | 11 + drivers/irqchip/irq-gic-common.h | 9 + drivers/irqchip/irq-gic-v3-its.c | 241 +++-- drivers/irqchip/irq-gic-v3.c | 373 ++++++- drivers/mfd/vexpress-sysreg.c | 133 ++- drivers/net/ethernet/cavium/Kconfig | 2 + drivers/net/ethernet/cavium/thunder/nic.h | 36 +- .../net/ethernet/cavium/thunder/nicvf_ethtool.c | 50 +- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 62 +- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 86 +- drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 41 - drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 175 ++- drivers/net/ethernet/mellanox/mlx4/alloc.c | 104 +- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 9 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_resources.c | 32 - drivers/net/ethernet/mellanox/mlx4/en_rx.c | 11 +- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 14 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 - drivers/net/ethernet/mellanox/mlx4/mr.c | 5 +- drivers/net/ethernet/smsc/smc91x.c | 11 +- drivers/net/ethernet/smsc/smsc911x.c | 38 + drivers/net/phy/Kconfig | 9 +- drivers/net/phy/marvell.c | 118 +- drivers/net/phy/mdio-octeon.c | 198 +++- drivers/pci/host/Kconfig | 12 + drivers/pci/host/Makefile | 2 + drivers/pci/host/pcie-thunder-pem.c | 462 ++++++++ drivers/pci/host/pcie-thunder.c | 335 ++++++ drivers/pci/pci-acpi.c | 13 + drivers/pci/pci.c | 98 +- drivers/pci/probe.c | 4 +- drivers/pnp/resource.c | 2 + drivers/tty/n_tty.c | 3 +- drivers/virtio/virtio_mmio.c | 12 +- include/acpi/actbl1.h | 12 +- include/asm-generic/vmlinux.lds.h | 7 + include/kvm/arm_vgic.h | 39 +- include/linux/iort.h | 39 + include/linux/irqchip/arm-gic-acpi.h | 3 + include/linux/irqchip/arm-gic-v3.h | 22 +- include/linux/kvm_host.h | 7 +- include/linux/mlx4/device.h | 11 +- include/linux/mmconfig.h | 86 ++ include/linux/pci-acpi.h | 2 + include/linux/pci.h | 11 +- include/uapi/linux/kvm.h | 11 +- virt/kvm/arm/arch_timer.c | 76 +- virt/kvm/arm/its-emul.c | 1141 ++++++++++++++++++++ virt/kvm/arm/its-emul.h | 55 + virt/kvm/arm/vgic-v2-emul.c | 15 + virt/kvm/arm/vgic-v2.c | 1 + virt/kvm/arm/vgic-v3-emul.c | 105 +- virt/kvm/arm/vgic-v3.c | 1 + virt/kvm/arm/vgic.c | 375 +++++-- virt/kvm/arm/vgic.h | 5 + virt/kvm/eventfd.c | 6 +- virt/kvm/irqchip.c | 12 +- 106 files changed, 5824 insertions(+), 1257 deletions(-) create mode 100644 arch/arm64/kernel/pci-acpi.c create mode 100644 arch/arm64/lib/copy_template.S delete mode 100644 arch/x86/pci/mmconfig_64.c create mode 100644 drivers/acpi/iort.c create mode 100644 drivers/acpi/mmconfig.c create mode 100644 drivers/pci/host/pcie-thunder-pem.c create mode 100644 drivers/pci/host/pcie-thunder.c create mode 100644 include/linux/iort.h create mode 100644 include/linux/mmconfig.h create mode 100644 virt/kvm/arm/its-emul.c create mode 100644 virt/kvm/arm/its-emul.h
From: David Daney david.daney@cavium.com
... and use is to force only_one_child() to return true.
Needed because the ThunderX PCIe RC cannot be identified by existing methods.
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pci/probe.c | 2 ++ include/linux/pci.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index cefd636..11ec2e7 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1643,6 +1643,8 @@ static int only_one_child(struct pci_bus *bus) { struct pci_dev *parent = bus->self;
+ if (bus->is_pcierc) + return 1; if (!parent || !pci_is_pcie(parent)) return 0; if (pci_pcie_type(parent) == PCI_EXP_TYPE_ROOT_PORT) diff --git a/include/linux/pci.h b/include/linux/pci.h index 8a0321a..1f1ce73 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -473,6 +473,7 @@ struct pci_bus { struct bin_attribute *legacy_io; /* legacy I/O for this bus */ struct bin_attribute *legacy_mem; /* legacy mem */ unsigned int is_added:1; + unsigned int is_pcierc:1; };
#define to_pci_bus(n) container_of(n, struct pci_bus, dev)
From: David Daney david.daney@cavium.com
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 14 +++++++++++++- include/linux/irqchip/arm-gic-v3.h | 2 ++ 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index c00e2db..83204f4 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1225,11 +1225,23 @@ static int its_pci_msi_vec_count(struct pci_dev *pdev) return max(msi, msix); }
+static u32 its_dflt_pci_requester_id(struct pci_dev *pdev, u16 alias) +{ + return alias; +} + +static its_pci_requester_id_t its_pci_requester_id = its_dflt_pci_requester_id; +void set_its_pci_requester_id(its_pci_requester_id_t fn) +{ + its_pci_requester_id = fn; +} +EXPORT_SYMBOL(set_its_pci_requester_id); + static int its_get_pci_alias(struct pci_dev *pdev, u16 alias, void *data) { struct its_pci_alias *dev_alias = data;
- dev_alias->dev_id = alias; + dev_alias->dev_id = its_pci_requester_id(pdev, alias); if (pdev != dev_alias->pdev) dev_alias->count += its_pci_msi_vec_count(dev_alias->pdev);
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index ffbc034..18e3757 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -389,6 +389,8 @@ int its_cpu_init(void); int its_init(struct device_node *node, struct rdists *rdists, struct irq_domain *domain);
+typedef u32 (*its_pci_requester_id_t)(struct pci_dev *, u16); +void set_its_pci_requester_id(its_pci_requester_id_t fn); #endif
#endif
From: David Daney david.daney@cavium.com
The default is to continue doing the what we have done before, but add a hook so that this can be overridden.
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/pci.h | 3 +++ arch/arm64/kernel/pci.c | 10 ++++++++++ 2 files changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h index b008a72..ad3fb18 100644 --- a/arch/arm64/include/asm/pci.h +++ b/arch/arm64/include/asm/pci.h @@ -37,6 +37,9 @@ static inline int pci_proc_domain(struct pci_bus *bus) { return 1; } + +void set_pcibios_add_device(int (*arg)(struct pci_dev *)); + #endif /* CONFIG_PCI */
#endif /* __KERNEL__ */ diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 4095379..3356023 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -38,11 +38,21 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res, return res->start; }
+static int (*pcibios_add_device_impl)(struct pci_dev *); + +void set_pcibios_add_device(int (*arg)(struct pci_dev *)) +{ + pcibios_add_device_impl = arg; +} + /* * Try to assign the IRQ number from DT when adding a new device */ int pcibios_add_device(struct pci_dev *dev) { + if (pcibios_add_device_impl) + return pcibios_add_device_impl(dev); + dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
return 0;
From: David Daney david.daney@cavium.com
Needed to map SPI interrupt sources.
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3.c | 5 +++++ include/linux/irqchip/arm-gic-v3.h | 1 + 2 files changed, 6 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index c52f7ba..0019fed 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -58,6 +58,11 @@ static struct gic_chip_data gic_data __read_mostly; /* Our default, arbitrary priority value. Linux only uses one anyway. */ #define DEFAULT_PMR_VALUE 0xf0
+struct irq_domain *gic_get_irq_domain(void) +{ + return gic_data.domain; +} + static inline unsigned int gic_irq(struct irq_data *d) { return d->hwirq; diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 18e3757..5992224 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -391,6 +391,7 @@ int its_init(struct device_node *node, struct rdists *rdists,
typedef u32 (*its_pci_requester_id_t)(struct pci_dev *, u16); void set_its_pci_requester_id(its_pci_requester_id_t fn); +struct irq_domain *gic_get_irq_domain(void); #endif
#endif
From: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pci/host/Kconfig | 12 + drivers/pci/host/Makefile | 2 + drivers/pci/host/pcie-thunder-pem.c | 462 ++++++++++++++++++++++++++++++++++++ drivers/pci/host/pcie-thunder.c | 423 +++++++++++++++++++++++++++++++++ 4 files changed, 899 insertions(+) create mode 100644 drivers/pci/host/pcie-thunder-pem.c create mode 100644 drivers/pci/host/pcie-thunder.c
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig index c132bdd..06e26ad 100644 --- a/drivers/pci/host/Kconfig +++ b/drivers/pci/host/Kconfig @@ -145,4 +145,16 @@ config PCIE_IPROC_BCMA Say Y here if you want to use the Broadcom iProc PCIe controller through the BCMA bus interface
+config PCI_THUNDER_PEM + bool + +config PCI_THUNDER + bool "Thunder PCIe host controller" + depends on ARM64 || COMPILE_TEST + depends on OF_PCI + select PCI_MSI + select PCI_THUNDER_PEM + help + Say Y here if you want internal PCI support on Thunder SoC. + endmenu diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile index 140d66f..a355155 100644 --- a/drivers/pci/host/Makefile +++ b/drivers/pci/host/Makefile @@ -17,3 +17,5 @@ obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o obj-$(CONFIG_PCIE_IPROC) += pcie-iproc.o obj-$(CONFIG_PCIE_IPROC_PLATFORM) += pcie-iproc-platform.o obj-$(CONFIG_PCIE_IPROC_BCMA) += pcie-iproc-bcma.o +obj-$(CONFIG_PCI_THUNDER) += pcie-thunder.o +obj-$(CONFIG_PCI_THUNDER_PEM) += pcie-thunder-pem.o diff --git a/drivers/pci/host/pcie-thunder-pem.c b/drivers/pci/host/pcie-thunder-pem.c new file mode 100644 index 0000000..7861a8a --- /dev/null +++ b/drivers/pci/host/pcie-thunder-pem.c @@ -0,0 +1,462 @@ +/* + * PCIe host controller driver for Cavium Thunder SOC + * + * Copyright (C) 2014,2015 Cavium Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + */ +/* #define DEBUG 1 */ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/delay.h> +#include <linux/irq.h> +#include <linux/pci.h> +#include <linux/irqdomain.h> +#include <linux/msi.h> +#include <linux/irqchip/arm-gic-v3.h> + +#define THUNDER_SLI_S2M_REG_ACC_BASE 0x874001000000ull + +#define THUNDER_GIC 0x801000000000ull +#define THUNDER_GICD_SETSPI_NSR 0x801000000040ull +#define THUNDER_GICD_CLRSPI_NSR 0x801000000048ull + +#define THUNDER_GSER_PCIE_MASK 0x01 + +#define PEM_CTL_STATUS 0x000 +#define PEM_RD_CFG 0x030 +#define P2N_BAR0_START 0x080 +#define P2N_BAR1_START 0x088 +#define P2N_BAR2_START 0x090 +#define BAR_CTL 0x0a8 +#define BAR2_MASK 0x0b0 +#define BAR1_INDEX 0x100 +#define PEM_CFG 0x410 +#define PEM_ON 0x420 + +struct thunder_pem { + struct list_head list; /* on thunder_pem_buses */ + bool connected; + unsigned int id; + unsigned int sli; + unsigned int sli_group; + unsigned int node; + u64 sli_window_base; + void __iomem *bar0; + void __iomem *bar4; + void __iomem *sli_s2m; + void __iomem *cfgregion; + struct pci_bus *bus; + int vwire_irqs[4]; + u32 vwire_data[4]; +}; + +static LIST_HEAD(thunder_pem_buses); + +static struct pci_device_id thunder_pem_pci_table[] = { + {PCI_VENDOR_ID_CAVIUM, 0xa020, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, + {0,} +}; +MODULE_DEVICE_TABLE(pci, thunder_pem_pci_table); + +enum slix_s2m_ctype { + CTYPE_MEMORY = 0, + CTYPE_CONFIG = 1, + CTYPE_IO = 2 +}; + +static u64 slix_s2m_reg_val(unsigned mac, enum slix_s2m_ctype ctype, + bool merge, bool relaxed, bool snoop, u32 ba_msb) +{ + u64 v; + + v = (u64)(mac % 3) << 49; + v |= (u64)ctype << 53; + if (!merge) + v |= 1ull << 48; + if (relaxed) + v |= 5ull << 40; + if (!snoop) + v |= 5ull << 41; + v |= (u64)ba_msb; + + return v; +} + +static u32 thunder_pcierc_config_read(struct thunder_pem *pem, u32 reg, int size) +{ + unsigned int val; + + writeq(reg & ~3u, pem->bar0 + PEM_RD_CFG); + val = readq(pem->bar0 + PEM_RD_CFG) >> 32; + + if (size == 1) + val = (val >> (8 * (reg & 3))) & 0xff; + else if (size == 2) + val = (val >> (8 * (reg & 3))) & 0xffff; + + return val; +} + +static int thunder_pem_read_config(struct pci_bus *bus, unsigned int devfn, + int reg, int size, u32 *val) +{ + void __iomem *addr; + struct thunder_pem *pem = bus->sysdata; + unsigned int busnr = bus->number; + + if (busnr > 255 || devfn > 255 || reg > 4095) + return PCIBIOS_DEVICE_NOT_FOUND; + + addr = pem->cfgregion + ((busnr << 24) | (devfn << 16) | reg); + + switch (size) { + case 1: + *val = readb(addr); + break; + case 2: + *val = readw(addr); + break; + case 4: + *val = readl(addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static int thunder_pem_write_config(struct pci_bus *bus, unsigned int devfn, + int reg, int size, u32 val) +{ + void __iomem *addr; + struct thunder_pem *pem = bus->sysdata; + unsigned int busnr = bus->number; + + if (busnr > 255 || devfn > 255 || reg > 4095) + return PCIBIOS_DEVICE_NOT_FOUND; + + addr = pem->cfgregion + ((busnr << 24) | (devfn << 16) | reg); + + switch (size) { + case 1: + writeb(val, addr); + break; + case 2: + writew(val, addr); + break; + case 4: + writel(val, addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static struct pci_ops thunder_pem_ops = { + .read = thunder_pem_read_config, + .write = thunder_pem_write_config, +}; + +static struct thunder_pem *thunder_pem_from_dev(struct pci_dev *dev) +{ + struct thunder_pem *pem; + struct pci_bus *bus = dev->bus; + + while (!pci_is_root_bus(bus)) + bus = bus->parent; + + list_for_each_entry(pem, &thunder_pem_buses, list) { + if (pem->bus == bus) + return pem; + } + return NULL; +} + +int thunder_pem_requester_id(struct pci_dev *dev) +{ + struct thunder_pem *pem = thunder_pem_from_dev(dev); + + if (!pem) + return -ENODEV; + + if (pem->id < 3) + return ((1 << 16) | + ((dev)->bus->number << 8) | + (dev)->devfn); + + if (pem->id < 6) + return ((3 << 16) | + ((dev)->bus->number << 8) | + (dev)->devfn); + + if (pem->id < 9) + return ((1 << 19) | (1 << 16) | + ((dev)->bus->number << 8) | + (dev)->devfn); + + if (pem->id < 12) + return ((1 << 19) | + (3 << 16) | + ((dev)->bus->number << 8) | + (dev)->devfn); + return -ENODEV; +} + +static int thunder_pem_pcibios_add_device(struct pci_dev *dev) +{ + struct thunder_pem *pem; + u8 pin; + + pem = thunder_pem_from_dev(dev); + if (!pem) + return 0; + + pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin); + + /* Cope with illegal. */ + if (pin > 4) + pin = 1; + + dev->irq = pin > 0 ? pem->vwire_irqs[pin - 1] : 0; + + if (pin) + dev_dbg(&dev->dev, "assigning IRQ %02d\n", dev->irq); + + pci_write_config_byte(dev, PCI_INTERRUPT_LINE, dev->irq); + + return 0; +} + +static int thunder_pem_pci_probe(struct pci_dev *pdev, + const struct pci_device_id *ent) +{ + struct thunder_pem *pem; + resource_size_t bar0_start; + u64 regval; + u64 sliaddr, pciaddr; + u32 cfgval; + int primary_bus; + int i; + int ret = 0; + struct resource *res; + LIST_HEAD(resources); + + set_pcibios_add_device(thunder_pem_pcibios_add_device); + + pem = devm_kzalloc(&pdev->dev, sizeof(*pem), GFP_KERNEL); + if (!pem) + return -ENOMEM; + + pci_set_drvdata(pdev, pem); + + bar0_start = pci_resource_start(pdev, 0); + pem->node = (bar0_start >> 44) & 3; + pem->id = ((bar0_start >> 24) & 7) + (6 * pem->node); + pem->sli = pem->id % 3; + pem->sli_group = (pem->id / 3) % 2; + pem->sli_window_base = 0x880000000000ull | (((u64)pem->node) << 44) | ((u64)pem->sli_group << 40); + pem->sli_window_base += 0x4000000000 * pem->sli; + + ret = pci_enable_device_mem(pdev); + if (ret) + goto out; + + pem->bar0 = pcim_iomap(pdev, 0, 0x100000); + if (!pem->bar0) { + ret = -ENOMEM; + goto out; + } + + pem->bar4 = pcim_iomap(pdev, 4, 0x100000); + if (!pem->bar0) { + ret = -ENOMEM; + goto out; + } + + sliaddr = THUNDER_SLI_S2M_REG_ACC_BASE | ((u64)pem->node << 44) | ((u64)pem->sli_group << 36); + + regval = readq(pem->bar0 + PEM_ON); + if (!(regval & 1)) { + dev_notice(&pdev->dev, "PEM%u_ON not set, skipping...\n", pem->id); + goto out; + } + + regval = readq(pem->bar0 + PEM_CTL_STATUS); + regval |= 0x10; /* Set Link Enable bit */ + writeq(regval, pem->bar0 + PEM_CTL_STATUS); + + udelay(1000); + + cfgval = thunder_pcierc_config_read(pem, 32 * 4, 4); /* PCIERC_CFG032 */ + + if (((cfgval >> 29 & 0x1) == 0x0) || ((cfgval >> 27 & 0x1) == 0x1)) { + dev_notice(&pdev->dev, "PEM%u Link Timeout, skipping...\n", pem->id); + goto out; + } + + pem->sli_s2m = devm_ioremap(&pdev->dev, sliaddr, 0x1000); + if (!pem->sli_s2m) { + ret = -ENOMEM; + goto out; + } + + pem->cfgregion = devm_ioremap(&pdev->dev, pem->sli_window_base, 0x100000000ull); + if (!pem->cfgregion) { + ret = -ENOMEM; + goto out; + } + regval = slix_s2m_reg_val(pem->sli, CTYPE_CONFIG, false, false, false, 0); + writeq(regval, pem->sli_s2m + 0x10 * ((0x40 * pem->sli) + 0)); + + cfgval = thunder_pcierc_config_read(pem, 6 * 4, 4); /* PCIERC_CFG006 */ + primary_bus = (cfgval >> 8) & 0xff; + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto out; + } + res->start = primary_bus; + res->end = 255; + res->flags = IORESOURCE_BUS; + pci_add_resource(&resources, res); + + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto out; + } + res->start = 0x100000 * pem->id; + res->end = res->start + 0x100000 - 1; + res->flags = IORESOURCE_IO; + pci_add_resource(&resources, res); + regval = slix_s2m_reg_val(pem->sli, CTYPE_IO, false, false, false, 0); + writeq(regval, pem->sli_s2m + 0x10 * ((0x40 * pem->sli) + 1)); + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto out; + } + pciaddr = 0x10000000ull; + res->start = pem->sli_window_base + 0x1000000000ull + pciaddr; + res->end = res->start + 0x1000000000ull - pciaddr - 1; + res->flags = IORESOURCE_MEM; + pci_add_resource_offset(&resources, res, res->start - pciaddr); + for (i = 0; i < 16; i++) { + regval = slix_s2m_reg_val(pem->sli, CTYPE_MEMORY, false, false, false, i); + writeq(regval, pem->sli_s2m + 0x10 * ((0x40 * pem->sli) + (0x10 + i))); + } + + res = kzalloc(sizeof(*res), GFP_KERNEL); + if (!res) { + ret = -ENOMEM; + goto out; + } + pciaddr = 0x1000000000ull; + res->start = pem->sli_window_base + 0x1000000000ull + pciaddr; + res->end = res->start + 0x1000000000ull - 1; + res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH; + pci_add_resource_offset(&resources, res, res->start - pciaddr); + for (i = 0; i < 16; i++) { + regval = slix_s2m_reg_val(pem->sli, CTYPE_MEMORY, true, true, true, i + 0x10); + writeq(regval, pem->sli_s2m + 0x10 * ((0x40 * pem->sli) + (0x20 + i))); + } + + writeq(0, pem->bar0 + P2N_BAR0_START); + writeq(0, pem->bar0 + P2N_BAR1_START); + writeq(0, pem->bar0 + P2N_BAR2_START); + + regval = 0x10; /* BAR_CTL[BAR1_SIZ] = 1 (64MB) */ + regval |= 0x8; /* BAR_CTL[BAR2_ENB] = 1 */ + writeq(regval, pem->bar0 + BAR_CTL); + + /* 1st 4MB region -> GIC registers so 32-bit MSI can reach the GIC. */ + regval = (THUNDER_GIC + (((u64)pem->node) << 44)) >> 18; + /* BAR1_INDEX[ADDR_V] = 1 */ + regval |= 1; + writeq(regval, pem->bar0 + BAR1_INDEX); + /* Remaining regions linear mapping to physical address space */ + for (i = 1; i < 16; i++) { + regval = (i << 4) | 1; + writeq(regval, pem->bar0 + BAR1_INDEX + 8 * i); + } + + pem->bus = pci_create_root_bus(&pdev->dev, primary_bus, &thunder_pem_ops, pem, &resources); + if (!pem->bus) { + ret = -ENODEV; + goto err_root_bus; + } + pem->bus->is_pcierc = 1; + list_add_tail(&pem->list, &thunder_pem_buses); + + for (i = 0; i < 3; i++) { + pem->vwire_data[i] = 40 + 4 * pem->id + i; + pem->vwire_irqs[i] = irq_create_mapping(gic_get_irq_domain(), pem->vwire_data[i]); + if (!pem->vwire_irqs[i]) { + dev_err(&pdev->dev, "Error: No irq mapping for %u\n", pem->vwire_data[i]); + continue; + } + irq_set_irq_type(pem->vwire_irqs[i], IRQ_TYPE_LEVEL_HIGH); + + writeq(THUNDER_GICD_SETSPI_NSR, pem->bar4 + 0 + (i + 2) * 32); + writeq(pem->vwire_data[i], pem->bar4 + 8 + (i + 2) * 32); + writeq(THUNDER_GICD_CLRSPI_NSR, pem->bar4 + 16 + (i + 2) * 32); + writeq(pem->vwire_data[i], pem->bar4 + 24 + (i + 2) * 32); + } + ret = pci_read_config_dword(pdev, 44 * 4, &cfgval); + if (WARN_ON(ret)) + goto err_free_root_bus; + cfgval &= ~0x40000000; /* Clear FUNM */ + cfgval |= 0x80000000; /* Set MSIXEN */ + pci_write_config_dword(pdev, 44 * 4, cfgval); + pem->bus->msi = pdev->bus->msi; + + pci_scan_child_bus(pem->bus); + pci_bus_add_devices(pem->bus); + pci_assign_unassigned_root_bus_resources(pem->bus); + + return 0; + +err_free_root_bus: + pci_remove_root_bus(pem->bus); +err_root_bus: + pci_free_resource_list(&resources); +out: + return ret; +} + +static void thunder_pem_pci_remove(struct pci_dev *pdev) +{ +} + +static struct pci_driver thunder_pem_driver = { + .name = "thunder_pem", + .id_table = thunder_pem_pci_table, + .probe = thunder_pem_pci_probe, + .remove = thunder_pem_pci_remove +}; + +static int __init thunder_pcie_init(void) +{ + int ret; + + ret = pci_register_driver(&thunder_pem_driver); + + return ret; +} +module_init(thunder_pcie_init); + +static void __exit thunder_pcie_exit(void) +{ + pci_unregister_driver(&thunder_pem_driver); +} +module_exit(thunder_pcie_exit); diff --git a/drivers/pci/host/pcie-thunder.c b/drivers/pci/host/pcie-thunder.c new file mode 100644 index 0000000..cbe4b44 --- /dev/null +++ b/drivers/pci/host/pcie-thunder.c @@ -0,0 +1,423 @@ +/* + * PCIe host controller driver for Cavium Thunder SOC + * + * Copyright (C) 2014, 2015 Cavium Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/of_irq.h> +#include <linux/of_pci.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/msi.h> +#include <linux/irqchip/arm-gic-v3.h> + +#define PCI_DEVICE_ID_THUNDER_BRIDGE 0xa002 + +#define THUNDER_PCIE_BUS_SHIFT 20 +#define THUNDER_PCIE_DEV_SHIFT 15 +#define THUNDER_PCIE_FUNC_SHIFT 12 + +#define THUNDER_ECAM0_CFG_BASE 0x848000000000 +#define THUNDER_ECAM1_CFG_BASE 0x849000000000 +#define THUNDER_ECAM2_CFG_BASE 0x84a000000000 +#define THUNDER_ECAM3_CFG_BASE 0x84b000000000 +#define THUNDER_ECAM4_CFG_BASE 0x948000000000 +#define THUNDER_ECAM5_CFG_BASE 0x949000000000 +#define THUNDER_ECAM6_CFG_BASE 0x94a000000000 +#define THUNDER_ECAM7_CFG_BASE 0x94b000000000 + +struct thunder_pcie { + struct device_node *node; + struct device *dev; + void __iomem *cfg_base; + struct msi_controller *msi; + int ecam; + bool valid; +}; + +int thunder_pem_requester_id(struct pci_dev *dev); + +static atomic_t thunder_pcie_ecam_probed; + +static u32 pci_requester_id_ecam(struct pci_dev *dev) +{ + return (((pci_domain_nr(dev->bus) >> 2) << 19) | + ((pci_domain_nr(dev->bus) % 4) << 16) | + (dev->bus->number << 8) | dev->devfn); +} + +static u32 thunder_pci_requester_id(struct pci_dev *dev, u16 alias) +{ + int ret; + + ret = thunder_pem_requester_id(dev); + if (ret >= 0) + return (u32)ret; + + return pci_requester_id_ecam(dev); +} + +/* + * This bridge is just for the sake of supporting ARI for + * downstream devices. No resources are attached to it. + * Copy upstream root bus resources to bridge which aide in + * resource claiming for downstream devices + */ +static void pci_bridge_resource_fixup(struct pci_dev *dev) +{ + struct pci_bus *bus; + int resno; + + bus = dev->subordinate; + for (resno = 0; resno < PCI_BRIDGE_RESOURCE_NUM; resno++) { + bus->resource[resno] = pci_bus_resource_n(bus->parent, + PCI_BRIDGE_RESOURCE_NUM + resno); + } + + for (resno = PCI_BRIDGE_RESOURCES; + resno <= PCI_BRIDGE_RESOURCE_END; resno++) { + dev->resource[resno].start = dev->resource[resno].end = 0; + dev->resource[resno].flags = 0; + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, PCI_DEVICE_ID_THUNDER_BRIDGE, + pci_bridge_resource_fixup); + +/* + * All PCIe devices in Thunder have fixed resources, shouldn't be reassigned. + * Also claim the device's valid resources to set 'res->parent' hierarchy. + */ +static void pci_dev_resource_fixup(struct pci_dev *dev) +{ + struct resource *res; + int resno; + + /* + * If the ECAM is not yet probed, we must be in a virtual + * machine. In that case, don't mark things as + * IORESOURCE_PCI_FIXED + */ + if (!atomic_read(&thunder_pcie_ecam_probed)) + return; + + for (resno = 0; resno < PCI_NUM_RESOURCES; resno++) + dev->resource[resno].flags |= IORESOURCE_PCI_FIXED; + + for (resno = 0; resno < PCI_BRIDGE_RESOURCES; resno++) { + res = &dev->resource[resno]; + if (res->parent || !(res->flags & IORESOURCE_MEM)) + continue; + pci_claim_resource(dev, resno); + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, PCI_ANY_ID, + pci_dev_resource_fixup); + +static void __iomem *thunder_pcie_get_cfg_addr(struct thunder_pcie *pcie, + unsigned int busnr, + unsigned int devfn, int reg) +{ + return pcie->cfg_base + + ((busnr << THUNDER_PCIE_BUS_SHIFT) + | (PCI_SLOT(devfn) << THUNDER_PCIE_DEV_SHIFT) + | (PCI_FUNC(devfn) << THUNDER_PCIE_FUNC_SHIFT)) + reg; +} + +static int thunder_pcie_read_config(struct pci_bus *bus, unsigned int devfn, + int reg, int size, u32 *val) +{ + struct thunder_pcie *pcie = bus->sysdata; + void __iomem *addr; + unsigned int busnr = bus->number; + + if (busnr > 255 || devfn > 255 || reg > 4095) + return PCIBIOS_DEVICE_NOT_FOUND; + + addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); + + switch (size) { + case 1: + *val = readb(addr); + break; + case 2: + *val = readw(addr); + break; + case 4: + *val = readl(addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static int thunder_pcie_write_config(struct pci_bus *bus, unsigned int devfn, + int reg, int size, u32 val) +{ + struct thunder_pcie *pcie = bus->sysdata; + void __iomem *addr; + unsigned int busnr = bus->number; + + if (busnr > 255 || devfn > 255 || reg > 4095) + return PCIBIOS_DEVICE_NOT_FOUND; + + addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); + + switch (size) { + case 1: + writeb(val, addr); + break; + case 2: + writew(val, addr); + break; + case 4: + writel(val, addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static struct pci_ops thunder_pcie_ops = { + .read = thunder_pcie_read_config, + .write = thunder_pcie_write_config, +}; + +static int thunder_pcie_msi_enable(struct thunder_pcie *pcie, + struct pci_bus *bus) +{ + struct device_node *msi_node; + + msi_node = of_parse_phandle(pcie->node, "msi-parent", 0); + if (!msi_node) + return -ENODEV; + + pcie->msi = of_pci_find_msi_chip_by_node(msi_node); + if (!pcie->msi) + return -ENODEV; + + pcie->msi->dev = pcie->dev; + bus->msi = pcie->msi; + + return 0; +} + +static void thunder_pcie_config(struct thunder_pcie *pcie, u64 addr) +{ + atomic_set(&thunder_pcie_ecam_probed, 1); + set_its_pci_requester_id(thunder_pci_requester_id); + + pcie->valid = true; + + switch (addr) { + case THUNDER_ECAM0_CFG_BASE: + pcie->ecam = 0; + break; + case THUNDER_ECAM1_CFG_BASE: + pcie->ecam = 1; + break; + case THUNDER_ECAM2_CFG_BASE: + pcie->ecam = 2; + break; + case THUNDER_ECAM3_CFG_BASE: + pcie->ecam = 3; + break; + case THUNDER_ECAM4_CFG_BASE: + pcie->ecam = 4; + break; + case THUNDER_ECAM5_CFG_BASE: + pcie->ecam = 5; + break; + case THUNDER_ECAM6_CFG_BASE: + pcie->ecam = 6; + break; + case THUNDER_ECAM7_CFG_BASE: + pcie->ecam = 7; + break; + default: + pcie->valid = false; + break; + } +} + +static int thunder_pcie_probe(struct platform_device *pdev) +{ + struct thunder_pcie *pcie; + struct resource *cfg_base; + struct pci_bus *bus; + int ret = 0; + LIST_HEAD(res); + + pcie = devm_kzalloc(&pdev->dev, sizeof(*pcie), GFP_KERNEL); + if (!pcie) + return -ENOMEM; + + pcie->node = of_node_get(pdev->dev.of_node); + pcie->dev = &pdev->dev; + + /* Get controller's configuration space range */ + cfg_base = platform_get_resource(pdev, IORESOURCE_MEM, 0); + + thunder_pcie_config(pcie, cfg_base->start); + + pcie->cfg_base = devm_ioremap_resource(&pdev->dev, cfg_base); + if (IS_ERR(pcie->cfg_base)) { + ret = PTR_ERR(pcie->cfg_base); + goto err_ioremap; + } + + dev_info(&pdev->dev, "ECAM%d CFG BASE 0x%llx\n", + pcie->ecam, (u64)cfg_base->start); + + ret = of_pci_get_host_bridge_resources(pdev->dev.of_node, + 0, 255, &res, NULL); + if (ret) + goto err_root_bus; + + bus = pci_create_root_bus(&pdev->dev, 0, &thunder_pcie_ops, pcie, &res); + if (!bus) { + ret = -ENODEV; + goto err_root_bus; + } + + /* Set reference to MSI chip */ + ret = thunder_pcie_msi_enable(pcie, bus); + if (ret) { + dev_err(&pdev->dev, + "Unable to set reference to MSI chip: ret=%d\n", ret); + goto err_msi; + } + + platform_set_drvdata(pdev, pcie); + + pci_scan_child_bus(bus); + pci_bus_add_devices(bus); + + return 0; +err_msi: + pci_remove_root_bus(bus); +err_root_bus: + pci_free_resource_list(&res); +err_ioremap: + of_node_put(pcie->node); + return ret; +} + +static const struct of_device_id thunder_pcie_of_match[] = { + { .compatible = "cavium,thunder-pcie", }, + {}, +}; +MODULE_DEVICE_TABLE(of, thunder_pcie_of_match); + +static struct platform_driver thunder_pcie_driver = { + .driver = { + .name = "thunder-pcie", + .owner = THIS_MODULE, + .of_match_table = thunder_pcie_of_match, + }, + .probe = thunder_pcie_probe, +}; +module_platform_driver(thunder_pcie_driver); + +#ifdef CONFIG_ACPI + +static int +thunder_mmcfg_read_config(struct pci_mmcfg_region *cfg, unsigned int busnr, + unsigned int devfn, int reg, int len, u32 *value) +{ + struct thunder_pcie *pcie = cfg->data; + void __iomem *addr; + + if (!pcie->valid) { + /* Not support for now */ + pr_err("RC PEM not supported !!!\n"); + return PCIBIOS_DEVICE_NOT_FOUND; + } + + addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); + + switch (len) { + case 1: + *value = readb(addr); + break; + case 2: + *value = readw(addr); + break; + case 4: + *value = readl(addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static int thunder_mmcfg_write_config(struct pci_mmcfg_region *cfg, + unsigned int busnr, unsigned int devfn, int reg, int len, + u32 value) { + struct thunder_pcie *pcie = cfg->data; + void __iomem *addr; + + if (!pcie->valid) { + /* Not support for now */ + pr_err("RC PEM not supported !!!\n"); + return PCIBIOS_DEVICE_NOT_FOUND; + } + + addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); + + switch (len) { + case 1: + writeb(value, addr); + break; + case 2: + writew(value, addr); + break; + case 4: + writel(value, addr); + break; + default: + return PCIBIOS_BAD_REGISTER_NUMBER; + } + + return PCIBIOS_SUCCESSFUL; +} + +static int thunder_acpi_mcfg_fixup(struct acpi_pci_root *root, + struct pci_mmcfg_region *cfg) +{ + struct thunder_pcie *pcie; + + pcie = kzalloc(sizeof(*pcie), GFP_KERNEL); + if (!pcie) + return -ENOMEM; + + pcie->dev = &root->device->dev; + + thunder_pcie_config(pcie, cfg->address); + + pcie->cfg_base = cfg->virt; + cfg->data = pcie; + cfg->read = thunder_mmcfg_read_config; + cfg->write = thunder_mmcfg_write_config; + + return 0; +} +DECLARE_ACPI_MCFG_FIXUP("CAVIUM", "THUNDERX", thunder_acpi_mcfg_fixup); +#endif + +MODULE_AUTHOR("Sunil Goutham"); +MODULE_DESCRIPTION("Cavium Thunder ECAM host controller driver"); +MODULE_LICENSE("GPL v2"); +
From: Robert Richter rrichter@cavium.com
This reverts commit 22b25883dd8632f5f84bc627eeb9ca81ee8f6377.
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pci/host/pcie-thunder.c | 88 ----------------------------------------- 1 file changed, 88 deletions(-)
diff --git a/drivers/pci/host/pcie-thunder.c b/drivers/pci/host/pcie-thunder.c index cbe4b44..7428401 100644 --- a/drivers/pci/host/pcie-thunder.c +++ b/drivers/pci/host/pcie-thunder.c @@ -329,94 +329,6 @@ static struct platform_driver thunder_pcie_driver = { }; module_platform_driver(thunder_pcie_driver);
-#ifdef CONFIG_ACPI - -static int -thunder_mmcfg_read_config(struct pci_mmcfg_region *cfg, unsigned int busnr, - unsigned int devfn, int reg, int len, u32 *value) -{ - struct thunder_pcie *pcie = cfg->data; - void __iomem *addr; - - if (!pcie->valid) { - /* Not support for now */ - pr_err("RC PEM not supported !!!\n"); - return PCIBIOS_DEVICE_NOT_FOUND; - } - - addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); - - switch (len) { - case 1: - *value = readb(addr); - break; - case 2: - *value = readw(addr); - break; - case 4: - *value = readl(addr); - break; - default: - return PCIBIOS_BAD_REGISTER_NUMBER; - } - - return PCIBIOS_SUCCESSFUL; -} - -static int thunder_mmcfg_write_config(struct pci_mmcfg_region *cfg, - unsigned int busnr, unsigned int devfn, int reg, int len, - u32 value) { - struct thunder_pcie *pcie = cfg->data; - void __iomem *addr; - - if (!pcie->valid) { - /* Not support for now */ - pr_err("RC PEM not supported !!!\n"); - return PCIBIOS_DEVICE_NOT_FOUND; - } - - addr = thunder_pcie_get_cfg_addr(pcie, busnr, devfn, reg); - - switch (len) { - case 1: - writeb(value, addr); - break; - case 2: - writew(value, addr); - break; - case 4: - writel(value, addr); - break; - default: - return PCIBIOS_BAD_REGISTER_NUMBER; - } - - return PCIBIOS_SUCCESSFUL; -} - -static int thunder_acpi_mcfg_fixup(struct acpi_pci_root *root, - struct pci_mmcfg_region *cfg) -{ - struct thunder_pcie *pcie; - - pcie = kzalloc(sizeof(*pcie), GFP_KERNEL); - if (!pcie) - return -ENOMEM; - - pcie->dev = &root->device->dev; - - thunder_pcie_config(pcie, cfg->address); - - pcie->cfg_base = cfg->virt; - cfg->data = pcie; - cfg->read = thunder_mmcfg_read_config; - cfg->write = thunder_mmcfg_write_config; - - return 0; -} -DECLARE_ACPI_MCFG_FIXUP("CAVIUM", "THUNDERX", thunder_acpi_mcfg_fixup); -#endif - MODULE_AUTHOR("Sunil Goutham"); MODULE_DESCRIPTION("Cavium Thunder ECAM host controller driver"); MODULE_LICENSE("GPL v2");
From: Robert Richter rrichter@cavium.com
Small fixes:
* Change function arg to const type.
* Move hfunc check to the beginning of the function to check parameter first.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c index a4228e6..1c7a98f 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c @@ -495,22 +495,21 @@ static int nicvf_get_rxfh(struct net_device *dev, u32 *indir, u8 *hkey, }
static int nicvf_set_rxfh(struct net_device *dev, const u32 *indir, - const u8 *hkey, u8 hfunc) + const u8 *hkey, const u8 hfunc) { struct nicvf *nic = netdev_priv(dev); struct nicvf_rss_info *rss = &nic->rss_info; int idx;
+ if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) + return -EOPNOTSUPP; + if ((nic->qs->rq_cnt <= 1) || (nic->cpi_alg != CPI_ALG_NONE)) { rss->enable = false; rss->hash_bits = 0; return -EIO; }
- /* We do not allow change in unsupported parameters */ - if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) - return -EOPNOTSUPP; - rss->enable = true; if (indir) { for (idx = 0; idx < rss->rss_size; idx++)
From: Sunil Goutham sgoutham@cavium.com
Adding support for receive hashing HW offload by using RSS_ALG and RSS_TAG fields of CQE_RX descriptor. Also removed dependency on minimum receive queue count to configure RSS so that hash is always generated.
This hash is used by RPS logic to distribute flows across multiple CPUs. Offload can be disabled via ethtool.
Signed-off-by: Sunil Goutham sgoutham@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- .../net/ethernet/cavium/thunder/nicvf_ethtool.c | 7 ++--- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 36 +++++++++++++++++++++- 2 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c index 1c7a98f..1c606e7 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c @@ -504,13 +504,12 @@ static int nicvf_set_rxfh(struct net_device *dev, const u32 *indir, if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) return -EOPNOTSUPP;
- if ((nic->qs->rq_cnt <= 1) || (nic->cpi_alg != CPI_ALG_NONE)) { - rss->enable = false; - rss->hash_bits = 0; + if (!rss->enable) { + netdev_err(nic->netdev, + "RSS is disabled, cannot change settings\n"); return -EIO; }
- rss->enable = true; if (indir) { for (idx = 0; idx < rss->rss_size; idx++) rss->ind_tbl[idx] = indir[idx]; diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 3b90afb..76c6c52 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -326,7 +326,7 @@ static int nicvf_rss_init(struct nicvf *nic)
nicvf_get_rss_size(nic);
- if ((nic->qs->rq_cnt <= 1) || (cpi_alg != CPI_ALG_NONE)) { + if (cpi_alg != CPI_ALG_NONE) { rss->enable = false; rss->hash_bits = 0; return 0; @@ -429,6 +429,34 @@ static void nicvf_snd_pkt_handler(struct net_device *netdev, } }
+static inline void nicvf_set_rxhash(struct net_device *netdev, + struct cqe_rx_t *cqe_rx, + struct sk_buff *skb) +{ + u8 hash_type; + u32 hash; + + if (!(netdev->features & NETIF_F_RXHASH)) + return; + + switch (cqe_rx->rss_alg) { + case RSS_ALG_TCP_IP: + case RSS_ALG_UDP_IP: + hash_type = PKT_HASH_TYPE_L4; + hash = cqe_rx->rss_tag; + break; + case RSS_ALG_IP: + hash_type = PKT_HASH_TYPE_L3; + hash = cqe_rx->rss_tag; + break; + default: + hash_type = PKT_HASH_TYPE_NONE; + hash = 0; + } + + skb_set_hash(skb, hash, hash_type); +} + static void nicvf_rcv_pkt_handler(struct net_device *netdev, struct napi_struct *napi, struct cmp_queue *cq, @@ -458,6 +486,8 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
nicvf_set_rx_frame_cnt(nic, skb);
+ nicvf_set_rxhash(netdev, cqe_rx, skb); + skb_record_rx_queue(skb, cqe_rx->rq_idx); if (netdev->hw_features & NETIF_F_RXCSUM) { /* HW by default verifies TCP/UDP/SCTP checksums */ @@ -1291,6 +1321,10 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netdev->features |= (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO | NETIF_F_GRO); +#ifdef VNIC_RSS_SUPPORT + netdev->features |= NETIF_F_RXHASH; +#endif + netdev->hw_features = netdev->features;
netdev->netdev_ops = &nicvf_netdev_ops;
From: Sunil Goutham sgoutham@cavium.com
Added ethtool support to dump receive packet error statistics reported in CQE. Also made some small fixes
Signed-off-by: Sunil Goutham sgoutham@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/thunder/nic.h | 36 +++++++-- .../net/ethernet/cavium/thunder/nicvf_ethtool.c | 34 +++++++-- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 26 ++++--- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 86 +++++++--------------- drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 41 ----------- 5 files changed, 103 insertions(+), 120 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h index 8aee250..58adfd6 100644 --- a/drivers/net/ethernet/cavium/thunder/nic.h +++ b/drivers/net/ethernet/cavium/thunder/nic.h @@ -190,10 +190,10 @@ enum tx_stats_reg_offset { };
struct nicvf_hw_stats { - u64 rx_bytes_ok; - u64 rx_ucast_frames_ok; - u64 rx_bcast_frames_ok; - u64 rx_mcast_frames_ok; + u64 rx_bytes; + u64 rx_ucast_frames; + u64 rx_bcast_frames; + u64 rx_mcast_frames; u64 rx_fcs_errors; u64 rx_l2_errors; u64 rx_drop_red; @@ -204,6 +204,31 @@ struct nicvf_hw_stats { u64 rx_drop_mcast; u64 rx_drop_l3_bcast; u64 rx_drop_l3_mcast; + u64 rx_bgx_truncated_pkts; + u64 rx_jabber_errs; + u64 rx_fcs_errs; + u64 rx_bgx_errs; + u64 rx_prel2_errs; + u64 rx_l2_hdr_malformed; + u64 rx_oversize; + u64 rx_undersize; + u64 rx_l2_len_mismatch; + u64 rx_l2_pclp; + u64 rx_ip_ver_errs; + u64 rx_ip_csum_errs; + u64 rx_ip_hdr_malformed; + u64 rx_ip_payload_malformed; + u64 rx_ip_ttl_errs; + u64 rx_l3_pclp; + u64 rx_l4_malformed; + u64 rx_l4_csum_errs; + u64 rx_udp_len_errs; + u64 rx_l4_port_errs; + u64 rx_tcp_flag_errs; + u64 rx_tcp_offset_errs; + u64 rx_l4_pclp; + u64 rx_truncated_pkts; + u64 tx_bytes_ok; u64 tx_ucast_frames_ok; u64 tx_bcast_frames_ok; @@ -222,6 +247,7 @@ struct nicvf_drv_stats { u64 rx_frames_1518; u64 rx_frames_jumbo; u64 rx_drops; + /* Tx */ u64 tx_frames_ok; u64 tx_drops; @@ -257,7 +283,7 @@ struct nicvf { u32 cq_coalesce_usecs;
u32 msg_enable; - struct nicvf_hw_stats stats; + struct nicvf_hw_stats hw_stats; struct nicvf_drv_stats drv_stats; struct bgx_stats bgx_stats; struct work_struct reset_task; diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c index 1c606e7..816a049 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c @@ -35,10 +35,10 @@ struct nicvf_stat { }
static const struct nicvf_stat nicvf_hw_stats[] = { - NICVF_HW_STAT(rx_bytes_ok), - NICVF_HW_STAT(rx_ucast_frames_ok), - NICVF_HW_STAT(rx_bcast_frames_ok), - NICVF_HW_STAT(rx_mcast_frames_ok), + NICVF_HW_STAT(rx_bytes), + NICVF_HW_STAT(rx_ucast_frames), + NICVF_HW_STAT(rx_bcast_frames), + NICVF_HW_STAT(rx_mcast_frames), NICVF_HW_STAT(rx_fcs_errors), NICVF_HW_STAT(rx_l2_errors), NICVF_HW_STAT(rx_drop_red), @@ -49,6 +49,30 @@ static const struct nicvf_stat nicvf_hw_stats[] = { NICVF_HW_STAT(rx_drop_mcast), NICVF_HW_STAT(rx_drop_l3_bcast), NICVF_HW_STAT(rx_drop_l3_mcast), + NICVF_HW_STAT(rx_bgx_truncated_pkts), + NICVF_HW_STAT(rx_jabber_errs), + NICVF_HW_STAT(rx_fcs_errs), + NICVF_HW_STAT(rx_bgx_errs), + NICVF_HW_STAT(rx_prel2_errs), + NICVF_HW_STAT(rx_l2_hdr_malformed), + NICVF_HW_STAT(rx_oversize), + NICVF_HW_STAT(rx_undersize), + NICVF_HW_STAT(rx_l2_len_mismatch), + NICVF_HW_STAT(rx_l2_pclp), + NICVF_HW_STAT(rx_ip_ver_errs), + NICVF_HW_STAT(rx_ip_csum_errs), + NICVF_HW_STAT(rx_ip_hdr_malformed), + NICVF_HW_STAT(rx_ip_payload_malformed), + NICVF_HW_STAT(rx_ip_ttl_errs), + NICVF_HW_STAT(rx_l3_pclp), + NICVF_HW_STAT(rx_l4_malformed), + NICVF_HW_STAT(rx_l4_csum_errs), + NICVF_HW_STAT(rx_udp_len_errs), + NICVF_HW_STAT(rx_l4_port_errs), + NICVF_HW_STAT(rx_tcp_flag_errs), + NICVF_HW_STAT(rx_tcp_offset_errs), + NICVF_HW_STAT(rx_l4_pclp), + NICVF_HW_STAT(rx_truncated_pkts), NICVF_HW_STAT(tx_bytes_ok), NICVF_HW_STAT(tx_ucast_frames_ok), NICVF_HW_STAT(tx_bcast_frames_ok), @@ -195,7 +219,7 @@ static void nicvf_get_ethtool_stats(struct net_device *netdev, nicvf_update_lmac_stats(nic);
for (stat = 0; stat < nicvf_n_hw_stats; stat++) - *(data++) = ((u64 *)&nic->stats) + *(data++) = ((u64 *)&nic->hw_stats) [nicvf_hw_stats[stat].index]; for (stat = 0; stat < nicvf_n_drv_stats; stat++) *(data++) = ((u64 *)&nic->drv_stats) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c index 76c6c52..0e8e2ce 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -484,6 +484,12 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev, skb->data, skb->len, true); }
+ /* If error packet, drop it here */ + if (err) { + dev_kfree_skb_any(skb); + return; + } + nicvf_set_rx_frame_cnt(nic, skb);
nicvf_set_rxhash(netdev, cqe_rx, skb); @@ -1148,7 +1154,7 @@ void nicvf_update_lmac_stats(struct nicvf *nic) void nicvf_update_stats(struct nicvf *nic) { int qidx; - struct nicvf_hw_stats *stats = &nic->stats; + struct nicvf_hw_stats *stats = &nic->hw_stats; struct nicvf_drv_stats *drv_stats = &nic->drv_stats; struct queue_set *qs = nic->qs;
@@ -1157,14 +1163,16 @@ void nicvf_update_stats(struct nicvf *nic) #define GET_TX_STATS(reg) \ nicvf_reg_read(nic, NIC_VNIC_TX_STAT_0_4 | (reg << 3))
- stats->rx_bytes_ok = GET_RX_STATS(RX_OCTS); - stats->rx_ucast_frames_ok = GET_RX_STATS(RX_UCAST); - stats->rx_bcast_frames_ok = GET_RX_STATS(RX_BCAST); - stats->rx_mcast_frames_ok = GET_RX_STATS(RX_MCAST); + stats->rx_bytes = GET_RX_STATS(RX_OCTS); + stats->rx_ucast_frames = GET_RX_STATS(RX_UCAST); + stats->rx_bcast_frames = GET_RX_STATS(RX_BCAST); + stats->rx_mcast_frames = GET_RX_STATS(RX_MCAST); stats->rx_fcs_errors = GET_RX_STATS(RX_FCS); stats->rx_l2_errors = GET_RX_STATS(RX_L2ERR); stats->rx_drop_red = GET_RX_STATS(RX_RED); + stats->rx_drop_red_bytes = GET_RX_STATS(RX_RED_OCTS); stats->rx_drop_overrun = GET_RX_STATS(RX_ORUN); + stats->rx_drop_overrun_bytes = GET_RX_STATS(RX_ORUN_OCTS); stats->rx_drop_bcast = GET_RX_STATS(RX_DRP_BCAST); stats->rx_drop_mcast = GET_RX_STATS(RX_DRP_MCAST); stats->rx_drop_l3_bcast = GET_RX_STATS(RX_DRP_L3BCAST); @@ -1176,9 +1184,6 @@ void nicvf_update_stats(struct nicvf *nic) stats->tx_mcast_frames_ok = GET_TX_STATS(TX_MCAST); stats->tx_drops = GET_TX_STATS(TX_DROP);
- drv_stats->rx_frames_ok = stats->rx_ucast_frames_ok + - stats->rx_bcast_frames_ok + - stats->rx_mcast_frames_ok; drv_stats->tx_frames_ok = stats->tx_ucast_frames_ok + stats->tx_bcast_frames_ok + stats->tx_mcast_frames_ok; @@ -1197,14 +1202,15 @@ static struct rtnl_link_stats64 *nicvf_get_stats64(struct net_device *netdev, struct rtnl_link_stats64 *stats) { struct nicvf *nic = netdev_priv(netdev); - struct nicvf_hw_stats *hw_stats = &nic->stats; + struct nicvf_hw_stats *hw_stats = &nic->hw_stats; struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
nicvf_update_stats(nic);
- stats->rx_bytes = hw_stats->rx_bytes_ok; + stats->rx_bytes = hw_stats->rx_bytes; stats->rx_packets = drv_stats->rx_frames_ok; stats->rx_dropped = drv_stats->rx_drops; + stats->multicast = hw_stats->rx_mcast_frames;
stats->tx_bytes = hw_stats->tx_bytes_ok; stats->tx_packets = drv_stats->tx_frames_ok; diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index ca4240a..4fc40d83 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -1371,10 +1371,11 @@ void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx) int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cmp_queue *cq, struct cqe_rx_t *cqe_rx) { - struct cmp_queue_stats *stats = &cq->stats; + struct nicvf_hw_stats *stats = &nic->hw_stats; + struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
if (!cqe_rx->err_level && !cqe_rx->err_opcode) { - stats->rx.errop.good++; + drv_stats->rx_frames_ok++; return 0; }
@@ -1384,111 +1385,78 @@ int nicvf_check_cqe_rx_errs(struct nicvf *nic, nic->netdev->name, cqe_rx->err_level, cqe_rx->err_opcode);
- switch (cqe_rx->err_level) { - case CQ_ERRLVL_MAC: - stats->rx.errlvl.mac_errs++; - break; - case CQ_ERRLVL_L2: - stats->rx.errlvl.l2_errs++; - break; - case CQ_ERRLVL_L3: - stats->rx.errlvl.l3_errs++; - break; - case CQ_ERRLVL_L4: - stats->rx.errlvl.l4_errs++; - break; - } - switch (cqe_rx->err_opcode) { case CQ_RX_ERROP_RE_PARTIAL: - stats->rx.errop.partial_pkts++; + stats->rx_bgx_truncated_pkts++; break; case CQ_RX_ERROP_RE_JABBER: - stats->rx.errop.jabber_errs++; + stats->rx_jabber_errs++; break; case CQ_RX_ERROP_RE_FCS: - stats->rx.errop.fcs_errs++; - break; - case CQ_RX_ERROP_RE_TERMINATE: - stats->rx.errop.terminate_errs++; + stats->rx_fcs_errs++; break; case CQ_RX_ERROP_RE_RX_CTL: - stats->rx.errop.bgx_rx_errs++; + stats->rx_bgx_errs++; break; case CQ_RX_ERROP_PREL2_ERR: - stats->rx.errop.prel2_errs++; - break; - case CQ_RX_ERROP_L2_FRAGMENT: - stats->rx.errop.l2_frags++; - break; - case CQ_RX_ERROP_L2_OVERRUN: - stats->rx.errop.l2_overruns++; - break; - case CQ_RX_ERROP_L2_PFCS: - stats->rx.errop.l2_pfcs++; - break; - case CQ_RX_ERROP_L2_PUNY: - stats->rx.errop.l2_puny++; + stats->rx_prel2_errs++; break; case CQ_RX_ERROP_L2_MAL: - stats->rx.errop.l2_hdr_malformed++; + stats->rx_l2_hdr_malformed++; break; case CQ_RX_ERROP_L2_OVERSIZE: - stats->rx.errop.l2_oversize++; + stats->rx_oversize++; break; case CQ_RX_ERROP_L2_UNDERSIZE: - stats->rx.errop.l2_undersize++; + stats->rx_undersize++; break; case CQ_RX_ERROP_L2_LENMISM: - stats->rx.errop.l2_len_mismatch++; + stats->rx_l2_len_mismatch++; break; case CQ_RX_ERROP_L2_PCLP: - stats->rx.errop.l2_pclp++; + stats->rx_l2_pclp++; break; case CQ_RX_ERROP_IP_NOT: - stats->rx.errop.non_ip++; + stats->rx_ip_ver_errs++; break; case CQ_RX_ERROP_IP_CSUM_ERR: - stats->rx.errop.ip_csum_err++; + stats->rx_ip_csum_errs++; break; case CQ_RX_ERROP_IP_MAL: - stats->rx.errop.ip_hdr_malformed++; + stats->rx_ip_hdr_malformed++; break; case CQ_RX_ERROP_IP_MALD: - stats->rx.errop.ip_payload_malformed++; + stats->rx_ip_payload_malformed++; break; case CQ_RX_ERROP_IP_HOP: - stats->rx.errop.ip_hop_errs++; - break; - case CQ_RX_ERROP_L3_ICRC: - stats->rx.errop.l3_icrc_errs++; + stats->rx_ip_ttl_errs++; break; case CQ_RX_ERROP_L3_PCLP: - stats->rx.errop.l3_pclp++; + stats->rx_l3_pclp++; break; case CQ_RX_ERROP_L4_MAL: - stats->rx.errop.l4_malformed++; + stats->rx_l4_malformed++; break; case CQ_RX_ERROP_L4_CHK: - stats->rx.errop.l4_csum_errs++; + stats->rx_l4_csum_errs++; break; case CQ_RX_ERROP_UDP_LEN: - stats->rx.errop.udp_len_err++; + stats->rx_udp_len_errs++; break; case CQ_RX_ERROP_L4_PORT: - stats->rx.errop.bad_l4_port++; + stats->rx_l4_port_errs++; break; case CQ_RX_ERROP_TCP_FLAG: - stats->rx.errop.bad_tcp_flag++; + stats->rx_tcp_flag_errs++; break; case CQ_RX_ERROP_TCP_OFFSET: - stats->rx.errop.tcp_offset_errs++; + stats->rx_tcp_offset_errs++; break; case CQ_RX_ERROP_L4_PCLP: - stats->rx.errop.l4_pclp++; + stats->rx_l4_pclp++; break; case CQ_RX_ERROP_RBDR_TRUNC: - stats->rx.errop.pkt_truncated++; + stats->rx_truncated_pkts++; break; }
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h index f0937b7..dc73872 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h @@ -181,47 +181,6 @@ enum CQ_TX_ERROP_E { };
struct cmp_queue_stats { - struct rx_stats { - struct { - u64 mac_errs; - u64 l2_errs; - u64 l3_errs; - u64 l4_errs; - } errlvl; - struct { - u64 good; - u64 partial_pkts; - u64 jabber_errs; - u64 fcs_errs; - u64 terminate_errs; - u64 bgx_rx_errs; - u64 prel2_errs; - u64 l2_frags; - u64 l2_overruns; - u64 l2_pfcs; - u64 l2_puny; - u64 l2_hdr_malformed; - u64 l2_oversize; - u64 l2_undersize; - u64 l2_len_mismatch; - u64 l2_pclp; - u64 non_ip; - u64 ip_csum_err; - u64 ip_hdr_malformed; - u64 ip_payload_malformed; - u64 ip_hop_errs; - u64 l3_icrc_errs; - u64 l3_pclp; - u64 l4_malformed; - u64 l4_csum_errs; - u64 udp_len_err; - u64 bad_l4_port; - u64 bad_tcp_flag; - u64 tcp_offset_errs; - u64 l4_pclp; - u64 pkt_truncated; - } errop; - } rx; struct tx_stats { u64 good; u64 desc_fault;
From: Robert Richter rrichter@cavium.com
We need code separation for later acpi integration.
Originally based on a patch from: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 48 +++++++++++++++++------ 1 file changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index b961a89..615b2af 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -835,18 +835,28 @@ static void bgx_get_qlm_mode(struct bgx *bgx) } }
-static void bgx_init_of(struct bgx *bgx, struct device_node *np) +#if IS_ENABLED(CONFIG_OF_MDIO) + +static int bgx_init_of_phy(struct bgx *bgx) { + struct device_node *np; struct device_node *np_child; u8 lmac = 0; + char bgx_sel[5]; + const char *mac;
- for_each_child_of_node(np, np_child) { - struct device_node *phy_np; - const char *mac; + /* Get BGX node from DT */ + snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); + np = of_find_node_by_name(NULL, bgx_sel); + if (!np) + return -ENODEV;
- phy_np = of_parse_phandle(np_child, "phy-handle", 0); - if (phy_np) - bgx->lmac[lmac].phydev = of_phy_find_device(phy_np); + for_each_child_of_node(np, np_child) { + struct device_node *phy_np = of_parse_phandle(np_child, + "phy-handle", 0); + if (!phy_np) + continue; + bgx->lmac[lmac].phydev = of_phy_find_device(phy_np);
mac = of_get_mac_address(np_child); if (mac) @@ -858,6 +868,21 @@ static void bgx_init_of(struct bgx *bgx, struct device_node *np) if (lmac == MAX_LMAC_PER_BGX) break; } + return 0; +} + +#else + +static int bgx_init_of_phy(struct bgx *bgx) +{ + return -ENODEV; +} + +#endif /* CONFIG_OF_MDIO */ + +static int bgx_init_phy(struct bgx *bgx) +{ + return bgx_init_of_phy(bgx); }
static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) @@ -865,8 +890,6 @@ static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) int err; struct device *dev = &pdev->dev; struct bgx *bgx = NULL; - struct device_node *np; - char bgx_sel[5]; u8 lmac;
bgx = devm_kzalloc(dev, sizeof(*bgx), GFP_KERNEL); @@ -902,10 +925,9 @@ static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent) bgx_vnic[bgx->bgx_id] = bgx; bgx_get_qlm_mode(bgx);
- snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id); - np = of_find_node_by_name(NULL, bgx_sel); - if (np) - bgx_init_of(bgx, np); + err = bgx_init_phy(bgx); + if (err) + goto err_enable;
bgx_init_hw(bgx);
From: Radha Mohan Chintakuntla rchintakuntla@cavium.com
This patch modifies the mdio-octeon driver to work on both ThunderX and Octeon SoCs from Cavium Inc.
Signed-off-by: Sunil Goutham sgoutham@cavium.com Signed-off-by: Radha Mohan Chintakuntla rchintakuntla@cavium.com Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/phy/Kconfig | 9 ++-- drivers/net/phy/mdio-octeon.c | 122 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 111 insertions(+), 20 deletions(-)
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index cb86d7a..73edba2 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -145,13 +145,14 @@ config MDIO_GPIO will be called mdio-gpio.
config MDIO_OCTEON - tristate "Support for MDIO buses on Octeon SOCs" - depends on CAVIUM_OCTEON_SOC + tristate "Support for MDIO buses on Octeon and ThunderX SOCs" + depends on 64BIT default y help
- This module provides a driver for the Octeon MDIO busses. - It is required by the Octeon Ethernet device drivers. + This module provides a driver for the Octeon and ThunderX MDIO + busses. It is required by the Octeon and ThunderX ethernet device + drivers.
If in doubt, say Y.
diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c index c838ad6..507aade 100644 --- a/drivers/net/phy/mdio-octeon.c +++ b/drivers/net/phy/mdio-octeon.c @@ -7,6 +7,7 @@ */
#include <linux/platform_device.h> +#include <linux/of_address.h> #include <linux/of_mdio.h> #include <linux/delay.h> #include <linux/module.h> @@ -14,11 +15,12 @@ #include <linux/phy.h> #include <linux/io.h>
+#ifdef CONFIG_CAVIUM_OCTEON_SOC #include <asm/octeon/octeon.h> -#include <asm/octeon/cvmx-smix-defs.h> +#endif
-#define DRV_VERSION "1.0" -#define DRV_DESCRIPTION "Cavium Networks Octeon SMI/MDIO driver" +#define DRV_VERSION "1.1" +#define DRV_DESCRIPTION "Cavium Networks Octeon/ThunderX SMI/MDIO driver"
#define SMI_CMD 0x0 #define SMI_WR_DAT 0x8 @@ -26,6 +28,79 @@ #define SMI_CLK 0x18 #define SMI_EN 0x20
+#ifdef __BIG_ENDIAN_BITFIELD +#define OCT_MDIO_BITFIELD_FIELD(field, more) \ + field; \ + more + +#else +#define OCT_MDIO_BITFIELD_FIELD(field, more) \ + more \ + field; + +#endif + +union cvmx_smix_clk { + uint64_t u64; + struct cvmx_smix_clk_s { + OCT_MDIO_BITFIELD_FIELD(u64 reserved_25_63:39, + OCT_MDIO_BITFIELD_FIELD(u64 mode:1, + OCT_MDIO_BITFIELD_FIELD(u64 reserved_21_23:3, + OCT_MDIO_BITFIELD_FIELD(u64 sample_hi:5, + OCT_MDIO_BITFIELD_FIELD(u64 sample_mode:1, + OCT_MDIO_BITFIELD_FIELD(u64 reserved_14_14:1, + OCT_MDIO_BITFIELD_FIELD(u64 clk_idle:1, + OCT_MDIO_BITFIELD_FIELD(u64 preamble:1, + OCT_MDIO_BITFIELD_FIELD(u64 sample:4, + OCT_MDIO_BITFIELD_FIELD(u64 phase:8, + ;)))))))))) + } s; +}; + +union cvmx_smix_cmd { + uint64_t u64; + struct cvmx_smix_cmd_s { + OCT_MDIO_BITFIELD_FIELD(u64 reserved_18_63:46, + OCT_MDIO_BITFIELD_FIELD(u64 phy_op:2, + OCT_MDIO_BITFIELD_FIELD(u64 reserved_13_15:3, + OCT_MDIO_BITFIELD_FIELD(u64 phy_adr:5, + OCT_MDIO_BITFIELD_FIELD(u64 reserved_5_7:3, + OCT_MDIO_BITFIELD_FIELD(u64 reg_adr:5, + ;)))))) + } s; +}; + +union cvmx_smix_en { + uint64_t u64; + struct cvmx_smix_en_s { + OCT_MDIO_BITFIELD_FIELD(u64 reserved_1_63:63, + OCT_MDIO_BITFIELD_FIELD(u64 en:1, + ;)) + } s; +}; + +union cvmx_smix_rd_dat { + uint64_t u64; + struct cvmx_smix_rd_dat_s { + OCT_MDIO_BITFIELD_FIELD(u64 reserved_18_63:46, + OCT_MDIO_BITFIELD_FIELD(u64 pending:1, + OCT_MDIO_BITFIELD_FIELD(u64 val:1, + OCT_MDIO_BITFIELD_FIELD(u64 dat:16, + ;)))) + } s; +}; + +union cvmx_smix_wr_dat { + uint64_t u64; + struct cvmx_smix_wr_dat_s { + OCT_MDIO_BITFIELD_FIELD(u64 reserved_18_63:46, + OCT_MDIO_BITFIELD_FIELD(u64 pending:1, + OCT_MDIO_BITFIELD_FIELD(u64 val:1, + OCT_MDIO_BITFIELD_FIELD(u64 dat:16, + ;)))) + } s; +}; + enum octeon_mdiobus_mode { UNINIT = 0, C22, @@ -41,6 +116,21 @@ struct octeon_mdiobus { int phy_irq[PHY_MAX_ADDR]; };
+#ifdef CONFIG_CAVIUM_OCTEON_SOC +static void oct_mdio_writeq(u64 val, u64 addr) +{ + cvmx_write_csr(addr, val); +} + +static u64 oct_mdio_readq(u64 addr) +{ + return cvmx_read_csr(addr); +} +#else +#define oct_mdio_writeq(val, addr) writeq_relaxed(val, (void *)addr) +#define oct_mdio_readq(addr) readq_relaxed((void *)addr) +#endif + static void octeon_mdiobus_set_mode(struct octeon_mdiobus *p, enum octeon_mdiobus_mode m) { @@ -49,10 +139,10 @@ static void octeon_mdiobus_set_mode(struct octeon_mdiobus *p, if (m == p->mode) return;
- smi_clk.u64 = cvmx_read_csr(p->register_base + SMI_CLK); + smi_clk.u64 = oct_mdio_readq(p->register_base + SMI_CLK); smi_clk.s.mode = (m == C45) ? 1 : 0; smi_clk.s.preamble = 1; - cvmx_write_csr(p->register_base + SMI_CLK, smi_clk.u64); + oct_mdio_writeq(smi_clk.u64, p->register_base + SMI_CLK); p->mode = m; }
@@ -67,7 +157,7 @@ static int octeon_mdiobus_c45_addr(struct octeon_mdiobus *p,
smi_wr.u64 = 0; smi_wr.s.dat = regnum & 0xffff; - cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64); + oct_mdio_writeq(smi_wr.u64, p->register_base + SMI_WR_DAT);
regnum = (regnum >> 16) & 0x1f;
@@ -75,14 +165,14 @@ static int octeon_mdiobus_c45_addr(struct octeon_mdiobus *p, smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_45_ADDRESS */ smi_cmd.s.phy_adr = phy_id; smi_cmd.s.reg_adr = regnum; - cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64); + oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
do { /* Wait 1000 clocks so we don't saturate the RSL bus * doing reads. */ __delay(1000); - smi_wr.u64 = cvmx_read_csr(p->register_base + SMI_WR_DAT); + smi_wr.u64 = oct_mdio_readq(p->register_base + SMI_WR_DAT); } while (smi_wr.s.pending && --timeout);
if (timeout <= 0) @@ -114,14 +204,14 @@ static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum) smi_cmd.s.phy_op = op; smi_cmd.s.phy_adr = phy_id; smi_cmd.s.reg_adr = regnum; - cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64); + oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
do { /* Wait 1000 clocks so we don't saturate the RSL bus * doing reads. */ __delay(1000); - smi_rd.u64 = cvmx_read_csr(p->register_base + SMI_RD_DAT); + smi_rd.u64 = oct_mdio_readq(p->register_base + SMI_RD_DAT); } while (smi_rd.s.pending && --timeout);
if (smi_rd.s.val) @@ -153,20 +243,20 @@ static int octeon_mdiobus_write(struct mii_bus *bus, int phy_id,
smi_wr.u64 = 0; smi_wr.s.dat = val; - cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64); + oct_mdio_writeq(smi_wr.u64, p->register_base + SMI_WR_DAT);
smi_cmd.u64 = 0; smi_cmd.s.phy_op = op; smi_cmd.s.phy_adr = phy_id; smi_cmd.s.reg_adr = regnum; - cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64); + oct_mdio_writeq(smi_cmd.u64, p->register_base + SMI_CMD);
do { /* Wait 1000 clocks so we don't saturate the RSL bus * doing reads. */ __delay(1000); - smi_wr.u64 = cvmx_read_csr(p->register_base + SMI_WR_DAT); + smi_wr.u64 = oct_mdio_readq(p->register_base + SMI_WR_DAT); } while (smi_wr.s.pending && --timeout);
if (timeout <= 0) @@ -210,7 +300,7 @@ static int octeon_mdiobus_probe(struct platform_device *pdev)
smi_en.u64 = 0; smi_en.s.en = 1; - cvmx_write_csr(bus->register_base + SMI_EN, smi_en.u64); + oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
bus->mii_bus->priv = bus; bus->mii_bus->irq = bus->phy_irq; @@ -234,7 +324,7 @@ fail_register: mdiobus_free(bus->mii_bus); fail: smi_en.u64 = 0; - cvmx_write_csr(bus->register_base + SMI_EN, smi_en.u64); + oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN); return err; }
@@ -248,7 +338,7 @@ static int octeon_mdiobus_remove(struct platform_device *pdev) mdiobus_unregister(bus->mii_bus); mdiobus_free(bus->mii_bus); smi_en.u64 = 0; - cvmx_write_csr(bus->register_base + SMI_EN, smi_en.u64); + oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN); return 0; }
From: Radha Mohan Chintakuntla rchintakuntla@cavium.com
This patch fixes a possible crash in the octeon_mdiobus_probe function if the return values are not handled properly.
Signed-off-by: Radha Mohan Chintakuntla rchintakuntla@cavium.com Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/phy/mdio-octeon.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c index 507aade..428ae75 100644 --- a/drivers/net/phy/mdio-octeon.c +++ b/drivers/net/phy/mdio-octeon.c @@ -277,24 +277,28 @@ static int octeon_mdiobus_probe(struct platform_device *pdev) return -ENOMEM;
res_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (res_mem == NULL) { dev_err(&pdev->dev, "found no memory resource\n"); - err = -ENXIO; - goto fail; + return -ENXIO; } + bus->mdio_phys = res_mem->start; bus->regsize = resource_size(res_mem); + if (!devm_request_mem_region(&pdev->dev, bus->mdio_phys, bus->regsize, res_mem->name)) { dev_err(&pdev->dev, "request_mem_region failed\n"); - goto fail; + return -ENXIO; } + bus->register_base = (u64)devm_ioremap(&pdev->dev, bus->mdio_phys, bus->regsize); + if (!bus->register_base) { + dev_err(&pdev->dev, "dev_ioremap failed\n"); + return -ENOMEM; + }
bus->mii_bus = mdiobus_alloc(); - if (!bus->mii_bus) goto fail;
From: Radha Mohan Chintakuntla rchintakuntla@cavium.com
The CONFIG_MDIO_OCTEON is required so that the ThunderX NIC driver can talk to the PHY drivers.
Signed-off-by: Radha Mohan Chintakuntla rchintakuntla@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/Kconfig | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig index c4d6bbe..3584420 100644 --- a/drivers/net/ethernet/cavium/Kconfig +++ b/drivers/net/ethernet/cavium/Kconfig @@ -37,6 +37,8 @@ config THUNDER_NIC_BGX tristate "Thunder MAC interface driver (BGX)" depends on 64BIT default ARCH_THUNDER + select PHYLIB + select MDIO_OCTEON ---help--- This driver supports programming and controlling of MAC interface from NIC physical function driver.
From: David Daney david.daney@cavium.com
The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak).
The MLX4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent))
This results in an OOPs when the device is opened.
To fix this...
Always use result of dma_alloc_coherent() directly.
Remove 'max_direct' parameter to mlx4_buf_alloc(), as it is unused, and adjust all callers.
Remove mlx4_en_map_buffer() and mlx4_en_unmap_buffer() as they now do nothing, and adjust all callers.
Remove 'page_list' element from struct mlx4_buf as it is unused.
Signed-off-by: David Daney david.daney@cavium.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/qp.c | 2 +- drivers/infiniband/hw/mlx4/srq.c | 3 +- drivers/net/ethernet/mellanox/mlx4/alloc.c | 104 +++++----------------- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 9 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_resources.c | 32 ------- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 11 +-- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 14 +-- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 - drivers/net/ethernet/mellanox/mlx4/mr.c | 5 +- include/linux/mlx4/device.h | 11 +-- 12 files changed, 33 insertions(+), 164 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c index 36eb3d0..25e5aa5 100644 --- a/drivers/infiniband/hw/mlx4/cq.c +++ b/drivers/infiniband/hw/mlx4/cq.c @@ -102,7 +102,7 @@ static int mlx4_ib_alloc_cq_buf(struct mlx4_ib_dev *dev, struct mlx4_ib_cq_buf * int err;
err = mlx4_buf_alloc(dev->dev, nent * dev->dev->caps.cqe_size, - PAGE_SIZE * 2, &buf->buf, GFP_KERNEL); + &buf->buf, GFP_KERNEL);
if (err) goto out; diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index c5a3a5f..baf9e11 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -772,7 +772,7 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd, *qp->db.db = 0; }
- if (mlx4_buf_alloc(dev->dev, qp->buf_size, PAGE_SIZE * 2, &qp->buf, gfp)) { + if (mlx4_buf_alloc(dev->dev, qp->buf_size, &qp->buf, gfp)) { err = -ENOMEM; goto err_db; } diff --git a/drivers/infiniband/hw/mlx4/srq.c b/drivers/infiniband/hw/mlx4/srq.c index dce5dfe..121730b 100644 --- a/drivers/infiniband/hw/mlx4/srq.c +++ b/drivers/infiniband/hw/mlx4/srq.c @@ -140,8 +140,7 @@ struct ib_srq *mlx4_ib_create_srq(struct ib_pd *pd,
*srq->db.db = 0;
- if (mlx4_buf_alloc(dev->dev, buf_size, PAGE_SIZE * 2, &srq->buf, - GFP_KERNEL)) { + if (mlx4_buf_alloc(dev->dev, buf_size, &srq->buf, GFP_KERNEL)) { err = -ENOMEM; goto err_db; } diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c index 0c51c69..db6ba3e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/alloc.c +++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c @@ -576,103 +576,41 @@ out:
return res; } -/* - * Handling for queue buffers -- we allocate a bunch of memory and - * register it in a memory region at HCA virtual address 0. If the - * requested size is > max_direct, we split the allocation into - * multiple pages, so we don't require too much contiguous memory. + +/* Handling for queue buffers -- we allocate a bunch of memory and + * register it in a memory region at HCA virtual address 0. */
-int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, +int mlx4_buf_alloc(struct mlx4_dev *dev, int size, struct mlx4_buf *buf, gfp_t gfp) { dma_addr_t t;
- if (size <= max_direct) { - buf->nbufs = 1; - buf->npages = 1; - buf->page_shift = get_order(size) + PAGE_SHIFT; - buf->direct.buf = dma_alloc_coherent(&dev->persist->pdev->dev, - size, &t, gfp); - if (!buf->direct.buf) - return -ENOMEM; - - buf->direct.map = t; - - while (t & ((1 << buf->page_shift) - 1)) { - --buf->page_shift; - buf->npages *= 2; - } + buf->nbufs = 1; + buf->npages = 1; + buf->page_shift = get_order(size) + PAGE_SHIFT; + buf->direct.buf = dma_alloc_coherent(&dev->persist->pdev->dev, + size, &t, gfp); + if (!buf->direct.buf) + return -ENOMEM;
- memset(buf->direct.buf, 0, size); - } else { - int i; - - buf->direct.buf = NULL; - buf->nbufs = (size + PAGE_SIZE - 1) / PAGE_SIZE; - buf->npages = buf->nbufs; - buf->page_shift = PAGE_SHIFT; - buf->page_list = kcalloc(buf->nbufs, sizeof(*buf->page_list), - gfp); - if (!buf->page_list) - return -ENOMEM; - - for (i = 0; i < buf->nbufs; ++i) { - buf->page_list[i].buf = - dma_alloc_coherent(&dev->persist->pdev->dev, - PAGE_SIZE, - &t, gfp); - if (!buf->page_list[i].buf) - goto err_free; - - buf->page_list[i].map = t; - - memset(buf->page_list[i].buf, 0, PAGE_SIZE); - } + buf->direct.map = t;
- if (BITS_PER_LONG == 64) { - struct page **pages; - pages = kmalloc(sizeof *pages * buf->nbufs, gfp); - if (!pages) - goto err_free; - for (i = 0; i < buf->nbufs; ++i) - pages[i] = virt_to_page(buf->page_list[i].buf); - buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP, PAGE_KERNEL); - kfree(pages); - if (!buf->direct.buf) - goto err_free; - } + while (t & ((1 << buf->page_shift) - 1)) { + --buf->page_shift; + buf->npages *= 2; }
- return 0; - -err_free: - mlx4_buf_free(dev, size, buf); + memset(buf->direct.buf, 0, size);
- return -ENOMEM; + return 0; } EXPORT_SYMBOL_GPL(mlx4_buf_alloc);
void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf) { - int i; - - if (buf->nbufs == 1) - dma_free_coherent(&dev->persist->pdev->dev, size, - buf->direct.buf, - buf->direct.map); - else { - if (BITS_PER_LONG == 64) - vunmap(buf->direct.buf); - - for (i = 0; i < buf->nbufs; ++i) - if (buf->page_list[i].buf) - dma_free_coherent(&dev->persist->pdev->dev, - PAGE_SIZE, - buf->page_list[i].buf, - buf->page_list[i].map); - kfree(buf->page_list); - } + dma_free_coherent(&dev->persist->pdev->dev, size, + buf->direct.buf, buf->direct.map); } EXPORT_SYMBOL_GPL(mlx4_buf_free);
@@ -789,7 +727,7 @@ void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db) EXPORT_SYMBOL_GPL(mlx4_db_free);
int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, - int size, int max_direct) + int size) { int err;
@@ -799,7 +737,7 @@ int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
*wqres->db.db = 0;
- err = mlx4_buf_alloc(dev, size, max_direct, &wqres->buf, GFP_KERNEL); + err = mlx4_buf_alloc(dev, size, &wqres->buf, GFP_KERNEL); if (err) goto err_db;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_cq.c b/drivers/net/ethernet/mellanox/mlx4/en_cq.c index 63769df..fa0e0b1 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_cq.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_cq.c @@ -73,22 +73,16 @@ int mlx4_en_create_cq(struct mlx4_en_priv *priv, */ set_dev_node(&mdev->dev->persist->pdev->dev, node); err = mlx4_alloc_hwq_res(mdev->dev, &cq->wqres, - cq->buf_size, 2 * PAGE_SIZE); + cq->buf_size); set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node); if (err) goto err_cq;
- err = mlx4_en_map_buffer(&cq->wqres.buf); - if (err) - goto err_res; - cq->buf = (struct mlx4_cqe *)cq->wqres.buf.direct.buf; *pcq = cq;
return 0;
-err_res: - mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size); err_cq: kfree(cq); *pcq = NULL; @@ -180,7 +174,6 @@ void mlx4_en_destroy_cq(struct mlx4_en_priv *priv, struct mlx4_en_cq **pcq) struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_cq *cq = *pcq;
- mlx4_en_unmap_buffer(&cq->wqres.buf); mlx4_free_hwq_res(mdev->dev, &cq->wqres, cq->buf_size); if (mlx4_is_eq_vector_valid(mdev->dev, priv->port, cq->vector) && cq->is_tx == RX) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index e0de2fd..e2a489c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -2895,7 +2895,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
/* Allocate page for receive rings */ err = mlx4_alloc_hwq_res(mdev->dev, &priv->res, - MLX4_EN_PAGE_SIZE, MLX4_EN_PAGE_SIZE); + MLX4_EN_PAGE_SIZE); if (err) { en_err(priv, "Failed to allocate page for rx qps\n"); goto out; diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c b/drivers/net/ethernet/mellanox/mlx4/en_resources.c index e482fa1b..e675dba 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c @@ -80,38 +80,6 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride, } }
- -int mlx4_en_map_buffer(struct mlx4_buf *buf) -{ - struct page **pages; - int i; - - if (BITS_PER_LONG == 64 || buf->nbufs == 1) - return 0; - - pages = kmalloc(sizeof *pages * buf->nbufs, GFP_KERNEL); - if (!pages) - return -ENOMEM; - - for (i = 0; i < buf->nbufs; ++i) - pages[i] = virt_to_page(buf->page_list[i].buf); - - buf->direct.buf = vmap(pages, buf->nbufs, VM_MAP, PAGE_KERNEL); - kfree(pages); - if (!buf->direct.buf) - return -ENOMEM; - - return 0; -} - -void mlx4_en_unmap_buffer(struct mlx4_buf *buf) -{ - if (BITS_PER_LONG == 64 || buf->nbufs == 1) - return; - - vunmap(buf->direct.buf); -} - void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event) { return; diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 9c145dd..e36f3c6 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -391,17 +391,11 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
/* Allocate HW buffers on provided NUMA node */ set_dev_node(&mdev->dev->persist->pdev->dev, node); - err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, - ring->buf_size, 2 * PAGE_SIZE); + err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size); set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node); if (err) goto err_info;
- err = mlx4_en_map_buffer(&ring->wqres.buf); - if (err) { - en_err(priv, "Failed to map RX buffer\n"); - goto err_hwq; - } ring->buf = ring->wqres.buf.direct.buf;
ring->hwtstamp_rx_filter = priv->hwtstamp_config.rx_filter; @@ -409,8 +403,6 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv, *pring = ring; return 0;
-err_hwq: - mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size); err_info: vfree(ring->rx_info); ring->rx_info = NULL; @@ -514,7 +506,6 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv, struct mlx4_en_dev *mdev = priv->mdev; struct mlx4_en_rx_ring *ring = *pring;
- mlx4_en_unmap_buffer(&ring->wqres.buf); mlx4_free_hwq_res(mdev->dev, &ring->wqres, size * stride + TXBB_SIZE); vfree(ring->rx_info); ring->rx_info = NULL; diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index c10d98f..47dd7a0 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -93,20 +93,13 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
/* Allocate HW buffers on provided NUMA node */ set_dev_node(&mdev->dev->persist->pdev->dev, node); - err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size, - 2 * PAGE_SIZE); + err = mlx4_alloc_hwq_res(mdev->dev, &ring->wqres, ring->buf_size); set_dev_node(&mdev->dev->persist->pdev->dev, mdev->dev->numa_node); if (err) { en_err(priv, "Failed allocating hwq resources\n"); goto err_bounce; }
- err = mlx4_en_map_buffer(&ring->wqres.buf); - if (err) { - en_err(priv, "Failed to map TX buffer\n"); - goto err_hwq_res; - } - ring->buf = ring->wqres.buf.direct.buf;
en_dbg(DRV, priv, "Allocated TX ring (addr:%p) - buf:%p size:%d buf_size:%d dma:%llx\n", @@ -117,7 +110,7 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv, MLX4_RESERVE_ETH_BF_QP); if (err) { en_err(priv, "failed reserving qp for TX ring\n"); - goto err_map; + goto err_hwq_res; }
err = mlx4_qp_alloc(mdev->dev, ring->qpn, &ring->qp, GFP_KERNEL); @@ -154,8 +147,6 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
err_reserve: mlx4_qp_release_range(mdev->dev, ring->qpn, 1); -err_map: - mlx4_en_unmap_buffer(&ring->wqres.buf); err_hwq_res: mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size); err_bounce: @@ -182,7 +173,6 @@ void mlx4_en_destroy_tx_ring(struct mlx4_en_priv *priv, mlx4_qp_remove(mdev->dev, &ring->qp); mlx4_qp_free(mdev->dev, &ring->qp); mlx4_qp_release_range(priv->mdev->dev, ring->qpn, 1); - mlx4_en_unmap_buffer(&ring->wqres.buf); mlx4_free_hwq_res(mdev->dev, &ring->wqres, ring->buf_size); kfree(ring->bounce_buf); ring->bounce_buf = NULL; diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 666d166..d34e785 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -795,8 +795,6 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int size, int stride, int is_tx, int rss, int qpn, int cqn, int user_prio, struct mlx4_qp_context *context); void mlx4_en_sqp_event(struct mlx4_qp *qp, enum mlx4_event event); -int mlx4_en_map_buffer(struct mlx4_buf *buf); -void mlx4_en_unmap_buffer(struct mlx4_buf *buf);
void mlx4_en_calc_rx_buf(struct net_device *dev); int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv); diff --git a/drivers/net/ethernet/mellanox/mlx4/mr.c b/drivers/net/ethernet/mellanox/mlx4/mr.c index 78f51e1..095f3ca 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mr.c +++ b/drivers/net/ethernet/mellanox/mlx4/mr.c @@ -802,10 +802,7 @@ int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt, return -ENOMEM;
for (i = 0; i < buf->npages; ++i) - if (buf->nbufs == 1) - page_list[i] = buf->direct.map + (i << buf->page_shift); - else - page_list[i] = buf->page_list[i].map; + page_list[i] = buf->direct.map + (i << buf->page_shift);
err = mlx4_write_mtt(dev, mtt, 0, buf->npages, page_list);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h index fd13c1c..3d33739 100644 --- a/include/linux/mlx4/device.h +++ b/include/linux/mlx4/device.h @@ -595,7 +595,6 @@ struct mlx4_buf_list {
struct mlx4_buf { struct mlx4_buf_list direct; - struct mlx4_buf_list *page_list; int nbufs; int npages; int page_shift; @@ -1024,16 +1023,12 @@ static inline int mlx4_is_eth(struct mlx4_dev *dev, int port) return dev->caps.port_type[port] == MLX4_PORT_TYPE_IB ? 0 : 1; }
-int mlx4_buf_alloc(struct mlx4_dev *dev, int size, int max_direct, +int mlx4_buf_alloc(struct mlx4_dev *dev, int size, struct mlx4_buf *buf, gfp_t gfp); void mlx4_buf_free(struct mlx4_dev *dev, int size, struct mlx4_buf *buf); static inline void *mlx4_buf_offset(struct mlx4_buf *buf, int offset) { - if (BITS_PER_LONG == 64 || buf->nbufs == 1) - return buf->direct.buf + offset; - else - return buf->page_list[offset >> PAGE_SHIFT].buf + - (offset & (PAGE_SIZE - 1)); + return buf->direct.buf + offset; }
int mlx4_pd_alloc(struct mlx4_dev *dev, u32 *pdn); @@ -1069,7 +1064,7 @@ int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order, void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres, - int size, int max_direct); + int size); void mlx4_free_hwq_res(struct mlx4_dev *mdev, struct mlx4_hwq_resources *wqres, int size);
From: Robert Richter rrichter@cavium.com
No need to read the typer register in the loop. Values do not change.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 83204f4..3542c75 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -838,6 +838,8 @@ static int its_alloc_tables(struct its_node *its) int psz = SZ_64K; u64 shr = GITS_BASER_InnerShareable; u64 cache = GITS_BASER_WaWb; + u64 typer = readq_relaxed(its->base + GITS_TYPER); + u32 ids = GITS_TYPER_DEVBITS(typer);
for (i = 0; i < GITS_BASER_NR_REGS; i++) { u64 val = readq_relaxed(its->base + GITS_BASER + i * 8); @@ -860,9 +862,6 @@ static int its_alloc_tables(struct its_node *its) * For other tables, only allocate a single page. */ if (type == GITS_BASER_TYPE_DEVICE) { - u64 typer = readq_relaxed(its->base + GITS_TYPER); - u32 ids = GITS_TYPER_DEVBITS(typer); - /* * 'order' was initialized earlier to the default page * granule of the the ITS. We can't have an allocation
From: Robert Richter rrichter@cavium.com
Some GIC revisions require an individual configuration to esp. add workarounds for HW bugs. This patch implements generic code to parse the hw revision provided by an IIDR register value and runs specific code if hw matches. There are functions that read the IIDR registers for GICV3 and ITS (GICD_IIDR/GITS_IIDR) and then go through a list of init functions to be called for specific versions.
The patch is needed to implement workarounds for HW errata in Cavium's ThunderX GICV3.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-common.c | 11 +++++++++++ drivers/irqchip/irq-gic-common.h | 9 +++++++++ drivers/irqchip/irq-gic-v3-its.c | 15 +++++++++++++++ drivers/irqchip/irq-gic-v3.c | 14 ++++++++++++++ 4 files changed, 49 insertions(+)
diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c index 9448e39..a55a609 100644 --- a/drivers/irqchip/irq-gic-common.c +++ b/drivers/irqchip/irq-gic-common.c @@ -21,6 +21,17 @@
#include "irq-gic-common.h"
+void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap, + void *data) +{ + for (; cap->desc; cap++) { + if ((iidr & cap->mask) != cap->id) + continue; + cap->init(data); + pr_info("%s\n", cap->desc); + } +} + int gic_configure_irq(unsigned int irq, unsigned int type, void __iomem *base, void (*sync_access)(void)) { diff --git a/drivers/irqchip/irq-gic-common.h b/drivers/irqchip/irq-gic-common.h index 35a9884..90d55b9 100644 --- a/drivers/irqchip/irq-gic-common.h +++ b/drivers/irqchip/irq-gic-common.h @@ -20,10 +20,19 @@ #include <linux/of.h> #include <linux/irqdomain.h>
+struct gic_capabilities { + const char *desc; + void (*init)(void *data); + u32 id; + u32 mask; +}; + int gic_configure_irq(unsigned int irq, unsigned int type, void __iomem *base, void (*sync_access)(void)); void gic_dist_config(void __iomem *base, int gic_irqs, void (*sync_access)(void)); void gic_cpu_config(void __iomem *base, void (*sync_access)(void)); +void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap, + void *data);
#endif /* _IRQ_GIC_COMMON_H */ diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3542c75..47a9595 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -36,6 +36,7 @@ #include <asm/cputype.h> #include <asm/exception.h>
+#include "irq-gic-common.h" #include "irqchip.h"
#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1 << 0) @@ -1432,6 +1433,18 @@ static int its_force_quiescent(void __iomem *base) } }
+static const struct gic_capabilities its_errata[] = { + { + } +}; + +static void its_check_capabilities(struct its_node *its) +{ + u32 iidr = readl_relaxed(its->base + GITS_IIDR); + + gic_check_capabilities(iidr, its_errata, its); +} + static int its_probe(struct device_node *node, struct irq_domain *parent) { struct resource res; @@ -1490,6 +1503,8 @@ static int its_probe(struct device_node *node, struct irq_domain *parent) } its->cmd_write = its->cmd_base;
+ its_check_capabilities(its); + err = its_alloc_tables(its); if (err) goto out_free_cmd; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 0019fed..7857943 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -771,6 +771,18 @@ static const struct irq_domain_ops gic_irq_domain_ops = { .free = gic_irq_domain_free, };
+static const struct gic_capabilities gicv3_errata[] = { + { + } +}; + +static void gicv3_check_capabilities(void) +{ + u32 iidr = readl_relaxed(gic_data.dist_base + GICD_IIDR); + + gic_check_capabilities(iidr, gicv3_errata, NULL); +} + static int __init gic_of_init(struct device_node *node, struct device_node *parent) { void __iomem *dist_base; @@ -830,6 +842,8 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare gic_data.nr_redist_regions = nr_redist_regions; gic_data.redist_stride = redist_stride;
+ gicv3_check_capabilities(); + /* * Find out how many interrupts are supported. * The GIC only supports up to 1020 interrupt sources (SGI+PPI+SPI)
From: Robert Richter rrichter@cavium.com
This patch implements Cavium ThunderX erratum 23154.
The gicv3 of ThunderX requires a modified version for reading the IAR status to ensure data synchronization. Since this is in the fast-path and called with each interrupt, runtime patching is used using jump label patching for smallest overhead (no-op). This is the same technique as used for tracepoints.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 7857943..9c67bda 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -112,14 +112,38 @@ static void gic_redist_wait_for_rwp(void) }
/* Low level accessors */ -static u64 __maybe_unused gic_read_iar(void) +static u64 gic_read_iar_common(void) +{ + u64 irqstat; + + asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat)); + return irqstat; +} + +/* Cavium ThunderX erratum 23154 */ +static u64 gic_read_iar_cavium_thunderx(void) { u64 irqstat;
+ asm volatile("nop;nop;nop;nop;"); + asm volatile("nop;nop;nop;nop;"); asm volatile("mrs_s %0, " __stringify(ICC_IAR1_EL1) : "=r" (irqstat)); + asm volatile("nop;nop;nop;nop;"); + mb(); + return irqstat; }
+struct static_key is_cavium_thunderx = STATIC_KEY_INIT_FALSE; + +static u64 __maybe_unused gic_read_iar(void) +{ + if (static_key_false(&is_cavium_thunderx)) + return gic_read_iar_common(); + else + return gic_read_iar_cavium_thunderx(); +} + static void __maybe_unused gic_write_pmr(u64 val) { asm volatile("msr_s " __stringify(ICC_PMR_EL1) ", %0" : : "r" (val)); @@ -771,8 +795,19 @@ static const struct irq_domain_ops gic_irq_domain_ops = { .free = gic_irq_domain_free, };
+static void gicv3_enable_cavium_thunderx(void *data) +{ + static_key_slow_inc(&is_cavium_thunderx); +} + static const struct gic_capabilities gicv3_errata[] = { { + .desc = "GIC: Cavium erratum 23154", + .id = 0xa100034c, /* ThunderX pass 1.x */ + .mask = 0xffff0fff, + .init = gicv3_enable_cavium_thunderx, + }, + { } };
From: Robert Richter rrichter@cavium.com
This implements two gicv3-its errata for ThunderX. Both with small impact affecting only ITS table allocation.
erratum 22375: only alloc 8MB table size erratum 24313: ignore memory access type
The fixes are in ITS initialization and basically ignore memory access type and table size provided by the TYPER and BASER registers.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 47a9595..d22bc39 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -39,7 +39,8 @@ #include "irq-gic-common.h" #include "irqchip.h"
-#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1 << 0) +#define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0) +#define ITS_FLAGS_CAVIUM_THUNDERX (1ULL << 1)
#define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
@@ -838,9 +839,22 @@ static int its_alloc_tables(struct its_node *its) int i; int psz = SZ_64K; u64 shr = GITS_BASER_InnerShareable; - u64 cache = GITS_BASER_WaWb; - u64 typer = readq_relaxed(its->base + GITS_TYPER); - u32 ids = GITS_TYPER_DEVBITS(typer); + u64 cache; + u64 typer; + u32 ids; + + if (its->flags & ITS_FLAGS_CAVIUM_THUNDERX) { + /* + * erratum 22375: only alloc 8MB table size + * erratum 24313: ignore memory access type + */ + cache = 0; + ids = 0x13; /* 20 bits, 8MB */ + } else { + cache = GITS_BASER_WaWb; + typer = readq_relaxed(its->base + GITS_TYPER); + ids = GITS_TYPER_DEVBITS(typer); + }
for (i = 0; i < GITS_BASER_NR_REGS; i++) { u64 val = readq_relaxed(its->base + GITS_BASER + i * 8); @@ -1433,8 +1447,21 @@ static int its_force_quiescent(void __iomem *base) } }
+static void its_enable_cavium_thunderx(void *data) +{ + struct its_node *its = data; + + its->flags |= ITS_FLAGS_CAVIUM_THUNDERX; +} + static const struct gic_capabilities its_errata[] = { { + .desc = "ITS: Cavium errata 22375, 24313", + .id = 0xa100034c, /* ThunderX pass 1.x */ + .mask = 0xffff0fff, + .init = its_enable_cavium_thunderx, + }, + { } };
From: Robert Richter rrichter@cavium.com
The number of pages for the its table may exceed the maximum of 256. Adding a range check and limitting the number to its maximum.
Based on a patch from Tirumalesh Chalamarla tchalamarla@cavium.com.
Cc: Tirumalesh Chalamarla tchalamarla@cavium.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 11 ++++++++++- include/linux/irqchip/arm-gic-v3.h | 1 + 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index d22bc39..cb7f33d 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -862,6 +862,7 @@ static int its_alloc_tables(struct its_node *its) u64 entry_size = GITS_BASER_ENTRY_SIZE(val); int order = get_order(psz); int alloc_size; + int alloc_pages; u64 tmp; void *base;
@@ -893,6 +894,14 @@ static int its_alloc_tables(struct its_node *its) }
alloc_size = (1 << order) * PAGE_SIZE; + alloc_pages = (alloc_size / psz); + if (alloc_pages > GITS_BASER_PAGES_MAX) { + alloc_pages = GITS_BASER_PAGES_MAX; + order = get_order(GITS_BASER_PAGES_MAX * psz); + pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n", + its->msi_chip.of_node->full_name, order, alloc_pages); + } + base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order); if (!base) { err = -ENOMEM; @@ -921,7 +930,7 @@ retry_baser: break; }
- val |= (alloc_size / psz) - 1; + val |= alloc_pages - 1;
writeq_relaxed(val, its->base + GITS_BASER + i * 8); tmp = readq_relaxed(its->base + GITS_BASER + i * 8); diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 5992224..6949da7 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -229,6 +229,7 @@ #define GITS_BASER_PAGE_SIZE_16K (1UL << GITS_BASER_PAGE_SIZE_SHIFT) #define GITS_BASER_PAGE_SIZE_64K (2UL << GITS_BASER_PAGE_SIZE_SHIFT) #define GITS_BASER_PAGE_SIZE_MASK (3UL << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGES_MAX 256
#define GITS_BASER_TYPE_NONE 0 #define GITS_BASER_TYPE_DEVICE 1
From: Radha Mohan Chintakuntla rchintakuntla@cavium.com
In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table which is bigger than the allowed max order. So we are forcing it only in case of 4KB page size.
Signed-off-by: Radha Mohan Chintakuntla rchintakuntla@cavium.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 318175f..e32e427 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -595,6 +595,7 @@ config XEN config FORCE_MAX_ZONEORDER int default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE) + default "13" if (ARCH_THUNDER && !ARM64_64K_PAGES) default "11"
menuconfig ARMV8_DEPRECATED
From: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com
In order to allow KVM to run on Thunder implementations, add the minimal support required.
Signed-off-by: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/cputype.h | 3 +++ arch/arm64/include/uapi/asm/kvm.h | 3 ++- arch/arm64/kvm/guest.c | 6 ++++++ arch/arm64/kvm/sys_regs_generic_v8.c | 2 ++ 4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index a84ec60..f603dcd 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -63,6 +63,7 @@ ((partnum) << MIDR_PARTNUM_SHIFT))
#define ARM_CPU_IMP_ARM 0x41 +#define ARM_CPU_IMP_CAVIUM 0x43 #define ARM_CPU_IMP_APM 0x50
#define ARM_CPU_PART_AEM_V8 0xD0F @@ -72,6 +73,8 @@
#define APM_CPU_PART_POTENZA 0x000
+#define ARM_CPU_PART_THUNDER 0x0A1 + #define ID_AA64MMFR0_BIGENDEL0_SHIFT 16 #define ID_AA64MMFR0_BIGENDEL0_MASK (0xf << ID_AA64MMFR0_BIGENDEL0_SHIFT) #define ID_AA64MMFR0_BIGENDEL0(mmfr0) \ diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index d268320..6c4c556 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -59,8 +59,9 @@ struct kvm_regs { #define KVM_ARM_TARGET_CORTEX_A57 2 #define KVM_ARM_TARGET_XGENE_POTENZA 3 #define KVM_ARM_TARGET_CORTEX_A53 4 +#define KVM_ARM_TARGET_CAVIUM_THUNDER 5
-#define KVM_ARM_NUM_TARGETS 5 +#define KVM_ARM_NUM_TARGETS 6
/* KVM_ARM_SET_DEVICE_ADDR ioctl id encoding */ #define KVM_ARM_DEVICE_TYPE_SHIFT 0 diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 9535bd5..3751f37 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -291,6 +291,12 @@ int __attribute_const__ kvm_target_cpu(void) return KVM_ARM_TARGET_XGENE_POTENZA; }; break; + case ARM_CPU_IMP_CAVIUM: + switch (part_number) { + case ARM_CPU_PART_THUNDER: + return KVM_ARM_TARGET_CAVIUM_THUNDER; + }; + break; };
return -EINVAL; diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c b/arch/arm64/kvm/sys_regs_generic_v8.c index 475fd29..0e48ee8 100644 --- a/arch/arm64/kvm/sys_regs_generic_v8.c +++ b/arch/arm64/kvm/sys_regs_generic_v8.c @@ -94,6 +94,8 @@ static int __init sys_reg_genericv8_init(void) &genericv8_target_table); kvm_register_target_sys_reg_table(KVM_ARM_TARGET_XGENE_POTENZA, &genericv8_target_table); + kvm_register_target_sys_reg_table(KVM_ARM_TARGET_CAVIUM_THUNDER, + &genericv8_target_table);
return 0; }
From: Andre Przywara andre.przywara@arm.com
Currently we track which IRQ has been mapped to which VGIC list register and also have to synchronize both. We used to do this to hold some extra state (for instance the active bit). It turns out that this extra state in the LRs is no longer needed and this extra tracking causes some pain later. Remove the tracking feature (lr_map and lr_used) and get rid of quite some code on the way. On a guest exit we pick up all still pending IRQs from the LRs and put them back in the distributor. We don't care about active-only IRQs, so we keep them in the LRs. They will be retired either by our vgic_process_maintenance() routine or by the GIC hardware in case of edge triggered interrupts. In places where we scan LRs we now use our shadow copy of the ELRSR register directly. This code change means we lose the "piggy-back" optimization, which would re-use an active-only LR to inject the pending state on top of it. Tracing with various workloads shows that this actually occurred very rarely, the ballpark figure is about once every 10,000 exits in a disk I/O heavy workload. Also the list registers don't seem to as scarce as assumed, with all 4 LRs on the popular implementations used less than once every 100,000 exits.
This has been briefly tested on Midway, Juno and the model (the latter both with GICv2 and GICv3 guests).
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 6 --- virt/kvm/arm/vgic-v2.c | 1 + virt/kvm/arm/vgic-v3.c | 1 + virt/kvm/arm/vgic.c | 143 ++++++++++++++++++++++--------------------------- 4 files changed, 66 insertions(+), 85 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 133ea00..2ccfa9a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -279,9 +279,6 @@ struct vgic_v3_cpu_if { };
struct vgic_cpu { - /* per IRQ to LR mapping */ - u8 *vgic_irq_lr_map; - /* Pending/active/both interrupts on this VCPU */ DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS); DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS); @@ -292,9 +289,6 @@ struct vgic_cpu { unsigned long *active_shared; unsigned long *pend_act_shared;
- /* Bitmap of used/free list registers */ - DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS); - /* Number of list registers on this CPU */ int nr_lr;
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c index f9b9c7c..f723710 100644 --- a/virt/kvm/arm/vgic-v2.c +++ b/virt/kvm/arm/vgic-v2.c @@ -144,6 +144,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu) * anyway. */ vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0; + vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
/* Get the show on the road... */ vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN; diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c index dff0602..21e5d28 100644 --- a/virt/kvm/arm/vgic-v3.c +++ b/virt/kvm/arm/vgic-v3.c @@ -178,6 +178,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu) * anyway. */ vgic_v3->vgic_vmcr = 0; + vgic_v3->vgic_elrsr = ~0;
/* * If we are emulating a GICv3, we do it in an non-GICv2-compatible diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index bc40137..394622c 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -79,7 +79,6 @@ #include "vgic.h"
static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
@@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio, return false; }
+static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr, + struct vgic_lr vlr) +{ + vgic_ops->sync_lr_elrsr(vcpu, lr, vlr); +} + +static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu) +{ + return vgic_ops->get_elrsr(vcpu); +} + /** * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs @@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio, void vgic_unqueue_irqs(struct kvm_vcpu *vcpu) { struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + u64 elrsr = vgic_get_elrsr(vcpu); + unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr); int i;
- for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) { + for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) { struct vgic_lr lr = vgic_get_lr(vcpu, i);
/* @@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu) * Mark the LR as free for other use. */ BUG_ON(lr.state & LR_STATE_MASK); - vgic_retire_lr(i, lr.irq, vcpu); + vgic_sync_lr_elrsr(vcpu, i, lr); vgic_irq_clear_queued(vcpu, lr.irq);
/* Finally update the VGIC state. */ @@ -1011,17 +1023,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, vgic_ops->set_lr(vcpu, lr, vlr); }
-static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr, - struct vgic_lr vlr) -{ - vgic_ops->sync_lr_elrsr(vcpu, lr, vlr); -} - -static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu) -{ - return vgic_ops->get_elrsr(vcpu); -} - static inline u64 vgic_get_eisr(struct kvm_vcpu *vcpu) { return vgic_ops->get_eisr(vcpu); @@ -1062,18 +1063,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu) vgic_ops->enable(vcpu); }
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) -{ - struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; - struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr); - - vlr.state = 0; - vgic_set_lr(vcpu, lr_nr, vlr); - clear_bit(lr_nr, vgic_cpu->lr_used); - vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY; - vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); -} - /* * An interrupt may have been disabled after being made pending on the * CPU interface (the classic case is a timer running while we're @@ -1085,23 +1074,32 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) */ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) { - struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; + u64 elrsr = vgic_get_elrsr(vcpu); + unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr); int lr; + struct vgic_lr vlr;
- for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) { - struct vgic_lr vlr = vgic_get_lr(vcpu, lr); + for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) { + vlr = vgic_get_lr(vcpu, lr);
if (!vgic_irq_is_enabled(vcpu, vlr.irq)) { - vgic_retire_lr(lr, vlr.irq, vcpu); - if (vgic_irq_is_queued(vcpu, vlr.irq)) - vgic_irq_clear_queued(vcpu, vlr.irq); + vlr.state = 0; + vgic_set_lr(vcpu, lr, vlr); + vgic_sync_lr_elrsr(vcpu, lr, vlr); + vgic_irq_clear_queued(vcpu, vlr.irq); } } }
static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, - int lr_nr, struct vgic_lr vlr) + int lr_nr, int sgi_source_id) { + struct vgic_lr vlr; + + vlr.state = 0; + vlr.irq = irq; + vlr.source = sgi_source_id; + if (vgic_irq_is_active(vcpu, irq)) { vlr.state |= LR_STATE_ACTIVE; kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); @@ -1126,9 +1124,9 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, */ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq) { - struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; struct vgic_dist *dist = &vcpu->kvm->arch.vgic; - struct vgic_lr vlr; + u64 elrsr = vgic_get_elrsr(vcpu); + unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr); int lr;
/* Sanitize the input... */ @@ -1138,42 +1136,20 @@ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
kvm_debug("Queue IRQ%d\n", irq);
- lr = vgic_cpu->vgic_irq_lr_map[irq]; - - /* Do we have an active interrupt for the same CPUID? */ - if (lr != LR_EMPTY) { - vlr = vgic_get_lr(vcpu, lr); - if (vlr.source == sgi_source_id) { - kvm_debug("LR%d piggyback for IRQ%d\n", lr, vlr.irq); - BUG_ON(!test_bit(lr, vgic_cpu->lr_used)); - vgic_queue_irq_to_lr(vcpu, irq, lr, vlr); - return true; - } - } + lr = find_first_bit(elrsr_ptr, vgic->nr_lr);
- /* Try to use another LR for this interrupt */ - lr = find_first_zero_bit((unsigned long *)vgic_cpu->lr_used, - vgic->nr_lr); if (lr >= vgic->nr_lr) return false;
kvm_debug("LR%d allocated for IRQ%d %x\n", lr, irq, sgi_source_id); - vgic_cpu->vgic_irq_lr_map[irq] = lr; - set_bit(lr, vgic_cpu->lr_used);
- vlr.irq = irq; - vlr.source = sgi_source_id; - vlr.state = 0; - vgic_queue_irq_to_lr(vcpu, irq, lr, vlr); + vgic_queue_irq_to_lr(vcpu, irq, lr, sgi_source_id);
return true; }
static bool vgic_queue_hwirq(struct kvm_vcpu *vcpu, int irq) { - if (!vgic_can_sample_irq(vcpu, irq)) - return true; /* level interrupt, already queued */ - if (vgic_queue_irq(vcpu, 0, irq)) { if (vgic_irq_is_edge(vcpu, irq)) { vgic_dist_irq_clear_pending(vcpu, irq); @@ -1346,29 +1322,44 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) struct vgic_dist *dist = &vcpu->kvm->arch.vgic; u64 elrsr; unsigned long *elrsr_ptr; - int lr, pending; - bool level_pending; + struct vgic_lr vlr; + int lr_nr; + bool pending; + + pending = vgic_process_maintenance(vcpu);
- level_pending = vgic_process_maintenance(vcpu); elrsr = vgic_get_elrsr(vcpu); elrsr_ptr = u64_to_bitmask(&elrsr);
- /* Clear mappings for empty LRs */ - for_each_set_bit(lr, elrsr_ptr, vgic->nr_lr) { - struct vgic_lr vlr; + for_each_clear_bit(lr_nr, elrsr_ptr, vgic_cpu->nr_lr) { + vlr = vgic_get_lr(vcpu, lr_nr); + + BUG_ON(!(vlr.state & LR_STATE_MASK)); + pending = true;
- if (!test_and_clear_bit(lr, vgic_cpu->lr_used)) + /* Reestablish SGI source for pending and active SGIs */ + if (vlr.irq < VGIC_NR_SGIS) + add_sgi_source(vcpu, vlr.irq, vlr.source); + + /* + * If the LR holds a pure active (10) interrupt then keep it + * in the LR without mirroring this status in the emulation. + */ + if (vlr.state == LR_STATE_ACTIVE) continue;
- vlr = vgic_get_lr(vcpu, lr); + if (vlr.state & LR_STATE_PENDING) + vgic_dist_irq_set_pending(vcpu, vlr.irq);
- BUG_ON(vlr.irq >= dist->nr_irqs); - vgic_cpu->vgic_irq_lr_map[vlr.irq] = LR_EMPTY; + /* Mark this LR as empty now. */ + vlr.state = 0; + vgic_set_lr(vcpu, lr_nr, vlr); + vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); } + vgic_update_state(vcpu->kvm);
- /* Check if we still have something up our sleeve... */ - pending = find_first_zero_bit(elrsr_ptr, vgic->nr_lr); - if (level_pending || pending < vgic->nr_lr) + /* vgic_update_state would not cover only-active IRQs */ + if (pending) set_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu); }
@@ -1590,11 +1581,9 @@ void kvm_vgic_vcpu_destroy(struct kvm_vcpu *vcpu) kfree(vgic_cpu->pending_shared); kfree(vgic_cpu->active_shared); kfree(vgic_cpu->pend_act_shared); - kfree(vgic_cpu->vgic_irq_lr_map); vgic_cpu->pending_shared = NULL; vgic_cpu->active_shared = NULL; vgic_cpu->pend_act_shared = NULL; - vgic_cpu->vgic_irq_lr_map = NULL; }
static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, int nr_irqs) @@ -1605,18 +1594,14 @@ static int vgic_vcpu_init_maps(struct kvm_vcpu *vcpu, int nr_irqs) vgic_cpu->pending_shared = kzalloc(sz, GFP_KERNEL); vgic_cpu->active_shared = kzalloc(sz, GFP_KERNEL); vgic_cpu->pend_act_shared = kzalloc(sz, GFP_KERNEL); - vgic_cpu->vgic_irq_lr_map = kmalloc(nr_irqs, GFP_KERNEL);
if (!vgic_cpu->pending_shared || !vgic_cpu->active_shared - || !vgic_cpu->pend_act_shared - || !vgic_cpu->vgic_irq_lr_map) { + || !vgic_cpu->pend_act_shared) { kvm_vgic_vcpu_destroy(vcpu); return -ENOMEM; }
- memset(vgic_cpu->vgic_irq_lr_map, LR_EMPTY, nr_irqs); - /* * Store the number of LRs per vcpu, so we don't have to go * all the way to the distributor structure to find out. Only
From: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com
The ARM GICv3 ITS MSI controller requires a device ID to be able to assign the proper interrupt vector. On real hardware, this ID is sampled from the bus. To be able to emulate an ITS controller, extend the KVM MSI interface to let userspace provide such a device ID. For PCI devices, the device ID is simply the 16-bit bus-device-function triplet, which should be easily available to the userland tool.
Also there is a new KVM capability which advertises whether the current VM requires a device ID to be set along with the MSI data. This flag is still reported as not available everywhere, later we will enable it when ITS emulation is used.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@linaro.org Signed-off-by: Tirumalesh Chalamarla tchalamarla@caviumnetworks.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/api.txt | 12 ++++++++++-- include/uapi/linux/kvm.h | 5 ++++- 2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a7926a9..cb04095 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2147,10 +2147,18 @@ struct kvm_msi { __u32 address_hi; __u32 data; __u32 flags; - __u8 pad[16]; + __u32 devid; + __u8 pad[12]; };
-No flags are defined so far. The corresponding field must be 0. +flags: KVM_MSI_VALID_DEVID: devid contains a valid value +devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier + for the device that wrote the MSI message. + For PCI, this is usually a BFD identifier in the lower 16 bits. + +The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide +the device ID. If this capability is not set, userland cannot rely on +the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
4.71 KVM_CREATE_PIT2 diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 716ad4a..1c48def 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_DISABLE_QUIRKS 116 #define KVM_CAP_X86_SMM 117 #define KVM_CAP_MULTI_ADDRESS_SPACE 118 +#define KVM_CAP_MSI_DEVID 119
#ifdef KVM_CAP_IRQ_ROUTING
@@ -968,12 +969,14 @@ struct kvm_one_reg { __u64 addr; };
+#define KVM_MSI_VALID_DEVID (1U << 0) struct kvm_msi { __u32 address_lo; __u32 address_hi; __u32 data; __u32 flags; - __u8 pad[16]; + __u32 devid; + __u8 pad[12]; };
struct kvm_arm_device_addr {
From: Andre Przywara andre.przywara@arm.com
Currently we destroy the VGIC emulation in one function that cares for all emulated models. To be on par with init_model (which is model specific), lets introduce a per-emulation-model destroy method, too. Use it for a tiny GICv3 specific code already, later it will be handy for the ITS emulation.
Signed-off-by: Andre Przywara andre.przywara@arm.com Reviewed-by: Eric Auger eric.auger@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 1 + virt/kvm/arm/vgic-v3-emul.c | 9 +++++++++ virt/kvm/arm/vgic.c | 11 ++++++++++- 3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 2ccfa9a..b18e2c5 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -144,6 +144,7 @@ struct vgic_vm_ops { bool (*queue_sgi)(struct kvm_vcpu *, int irq); void (*add_sgi_source)(struct kvm_vcpu *, int irq, int source); int (*init_model)(struct kvm *); + void (*destroy_model)(struct kvm *); int (*map_resources)(struct kvm *, const struct vgic_params *); };
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index e661e7f..d2eeb20 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -862,6 +862,14 @@ static int vgic_v3_init_model(struct kvm *kvm) return 0; }
+static void vgic_v3_destroy_model(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + + kfree(dist->irq_spi_mpidr); + dist->irq_spi_mpidr = NULL; +} + /* GICv3 does not keep track of SGI sources anymore. */ static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source) { @@ -874,6 +882,7 @@ void vgic_v3_init_emulation(struct kvm *kvm) dist->vm_ops.queue_sgi = vgic_v3_queue_sgi; dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source; dist->vm_ops.init_model = vgic_v3_init_model; + dist->vm_ops.destroy_model = vgic_v3_destroy_model; dist->vm_ops.map_resources = vgic_v3_map_resources;
kvm->arch.max_vcpus = KVM_MAX_VCPUS; diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 394622c..cc8f5ed 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -100,6 +100,14 @@ int kvm_vgic_map_resources(struct kvm *kvm) return kvm->arch.vgic.vm_ops.map_resources(kvm, vgic); }
+static void vgic_destroy_model(struct kvm *kvm) +{ + struct vgic_vm_ops *vm_ops = &kvm->arch.vgic.vm_ops; + + if (vm_ops->destroy_model) + vm_ops->destroy_model(kvm); +} + /* * struct vgic_bitmap contains a bitmap made of unsigned longs, but * extracts u32s out of them. @@ -1629,6 +1637,8 @@ void kvm_vgic_destroy(struct kvm *kvm) struct kvm_vcpu *vcpu; int i;
+ vgic_destroy_model(kvm); + kvm_for_each_vcpu(i, vcpu, kvm) kvm_vgic_vcpu_destroy(vcpu);
@@ -1645,7 +1655,6 @@ void kvm_vgic_destroy(struct kvm *kvm) } kfree(dist->irq_sgi_sources); kfree(dist->irq_spi_cpu); - kfree(dist->irq_spi_mpidr); kfree(dist->irq_spi_target); kfree(dist->irq_pending_on_cpu); kfree(dist->irq_active_on_cpu);
From: Andre Przywara andre.przywara@arm.com
KVM capabilities can be a per-VM property, though ARM/ARM64 currently does not pass on the VM pointer to the architecture specific capability handlers. Add a "struct kvm*" parameter to those function to later allow proper per-VM capability reporting.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm/include/asm/kvm_host.h | 2 +- arch/arm/kvm/arm.c | 2 +- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/kvm/reset.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index e896d2c..56cac05 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -213,7 +213,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr, kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr); }
-static inline int kvm_arch_dev_ioctl_check_extension(long ext) +static inline int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) { return 0; } diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index bc738d2..7c65353 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -196,7 +196,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_MAX_VCPUS; break; default: - r = kvm_arch_dev_ioctl_check_extension(ext); + r = kvm_arch_dev_ioctl_check_extension(kvm, ext); break; } return r; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2709db2..8d78a72 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -47,7 +47,7 @@
int __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); -int kvm_arch_dev_ioctl_check_extension(long ext); +int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
struct kvm_arch { /* The VMID generation used for the virt. memory system */ diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 0b43265..866502b 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -56,7 +56,7 @@ static bool cpu_has_32bit_el1(void) return !!(pfr0 & 0x20); }
-int kvm_arch_dev_ioctl_check_extension(long ext) +int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) { int r;
From: Andre Przywara andre.przywara@arm.com
Currently we initialize all the possible GIC frame addresses in one function, without looking at the specific GIC model we instantiate for the guest. As this gets confusing when adding another VGIC model later, lets move these initializations into the respective model's init functions.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- virt/kvm/arm/vgic-v2-emul.c | 3 +++ virt/kvm/arm/vgic-v3-emul.c | 3 +++ virt/kvm/arm/vgic.c | 3 --- 3 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c index 1390797..8faa28c 100644 --- a/virt/kvm/arm/vgic-v2-emul.c +++ b/virt/kvm/arm/vgic-v2-emul.c @@ -567,6 +567,9 @@ void vgic_v2_init_emulation(struct kvm *kvm) dist->vm_ops.init_model = vgic_v2_init_model; dist->vm_ops.map_resources = vgic_v2_map_resources;
+ dist->vgic_cpu_base = VGIC_ADDR_UNDEF; + dist->vgic_dist_base = VGIC_ADDR_UNDEF; + kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS; }
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index d2eeb20..1f42348 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -885,6 +885,9 @@ void vgic_v3_init_emulation(struct kvm *kvm) dist->vm_ops.destroy_model = vgic_v3_destroy_model; dist->vm_ops.map_resources = vgic_v3_map_resources;
+ dist->vgic_dist_base = VGIC_ADDR_UNDEF; + dist->vgic_redist_base = VGIC_ADDR_UNDEF; + kvm->arch.max_vcpus = KVM_MAX_VCPUS; }
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index cc8f5ed..59f1801 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1830,9 +1830,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type) kvm->arch.vgic.in_kernel = true; kvm->arch.vgic.vgic_model = type; kvm->arch.vgic.vctrl_base = vgic->vctrl_base; - kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF; - kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF; - kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
out_unlock: for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
From: Andre Przywara andre.przywara@arm.com
The ARM GICv3 ITS controller requires a separate register frame to cover ITS specific registers. Add a new VGIC address type and store the address in a field in the vgic_dist structure. Provide a function to check whether userland has provided the address, so ITS functionality can be guarded by that check.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/devices/arm-vgic.txt | 9 +++++++++ arch/arm64/include/uapi/asm/kvm.h | 2 ++ include/kvm/arm_vgic.h | 3 +++ virt/kvm/arm/vgic-v3-emul.c | 2 ++ virt/kvm/arm/vgic.c | 16 ++++++++++++++++ virt/kvm/arm/vgic.h | 1 + 6 files changed, 33 insertions(+)
diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt b/Documentation/virtual/kvm/devices/arm-vgic.txt index 3fb9054..ec715f9e 100644 --- a/Documentation/virtual/kvm/devices/arm-vgic.txt +++ b/Documentation/virtual/kvm/devices/arm-vgic.txt @@ -39,6 +39,15 @@ Groups: Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. This address needs to be 64K aligned.
+ KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit) + Base address in the guest physical address space of the GICv3 ITS + control register frame. The ITS allows MSI(-X) interrupts to be + injected into guests. This extension is optional, if the kernel + does not support the ITS, the call returns -ENODEV. + This memory is solely for the guest to access the ITS control + registers and does not cover the ITS translation register. + Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. + This address needs to be 64K aligned and the region covers 64 KByte.
KVM_DEV_ARM_VGIC_GRP_DIST_REGS Attributes: diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 6c4c556..8af76fd 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -82,9 +82,11 @@ struct kvm_regs { /* Supported VGICv3 address types */ #define KVM_VGIC_V3_ADDR_TYPE_DIST 2 #define KVM_VGIC_V3_ADDR_TYPE_REDIST 3 +#define KVM_VGIC_V3_ADDR_TYPE_ITS 4
#define KVM_VGIC_V3_DIST_SIZE SZ_64K #define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K) +#define KVM_VGIC_V3_ITS_SIZE SZ_64K
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */ #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */ diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index b18e2c5..3ee063b 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -178,6 +178,9 @@ struct vgic_dist { phys_addr_t vgic_redist_base; };
+ /* The base address of the ITS control register frame */ + phys_addr_t vgic_its_base; + /* Distributor enabled */ u32 enabled;
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 1f42348..a8cf669 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -887,6 +887,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vgic_dist_base = VGIC_ADDR_UNDEF; dist->vgic_redist_base = VGIC_ADDR_UNDEF; + dist->vgic_its_base = VGIC_ADDR_UNDEF;
kvm->arch.max_vcpus = KVM_MAX_VCPUS; } @@ -1059,6 +1060,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev, return -ENXIO; case KVM_VGIC_V3_ADDR_TYPE_DIST: case KVM_VGIC_V3_ADDR_TYPE_REDIST: + case KVM_VGIC_V3_ADDR_TYPE_ITS: return 0; } break; diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 59f1801..15e447f 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -930,6 +930,16 @@ int vgic_register_kvm_io_dev(struct kvm *kvm, gpa_t base, int len, return ret; }
+bool vgic_has_its(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + + if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3) + return false; + + return !IS_VGIC_ADDR_UNDEF(dist->vgic_its_base); +} + static int vgic_nr_shared_irqs(struct vgic_dist *dist) { return dist->nr_irqs - VGIC_NR_PRIVATE_IRQS; @@ -1927,6 +1937,12 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write) block_size = KVM_VGIC_V3_REDIST_SIZE; alignment = SZ_64K; break; + case KVM_VGIC_V3_ADDR_TYPE_ITS: + type_needed = KVM_DEV_TYPE_ARM_VGIC_V3; + addr_ptr = &vgic->vgic_its_base; + block_size = KVM_VGIC_V3_ITS_SIZE; + alignment = SZ_64K; + break; #endif default: r = -ENODEV; diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h index 0df74cb..a093f5c 100644 --- a/virt/kvm/arm/vgic.h +++ b/virt/kvm/arm/vgic.h @@ -136,5 +136,6 @@ int vgic_get_common_attr(struct kvm_device *dev, struct kvm_device_attr *attr); int vgic_init(struct kvm *kvm); void vgic_v2_init_emulation(struct kvm *kvm); void vgic_v3_init_emulation(struct kvm *kvm); +bool vgic_has_its(struct kvm *kvm);
#endif
From: Andre Przywara andre.przywara@arm.com
In the GICv3 redistributor there are the PENDBASER and PROPBASER registers which we did not emulate so far, as they only make sense when having an ITS. In preparation for that emulate those MMIO accesses by storing the 64-bit data written into it into a variable which we later read in the ITS emulation.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 8 ++++++++ virt/kvm/arm/vgic-v3-emul.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/vgic.c | 35 +++++++++++++++++++++++++++++++++++ virt/kvm/arm/vgic.h | 4 ++++ 4 files changed, 91 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 3ee063b..8c6cb0e 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -256,6 +256,14 @@ struct vgic_dist { struct vgic_vm_ops vm_ops; struct vgic_io_device dist_iodev; struct vgic_io_device *redist_iodevs; + + /* Address of LPI configuration table shared by all redistributors */ + u64 propbaser; + + /* Addresses of LPI pending tables per redistributor */ + u64 *pendbaser; + + bool lpis_enabled; };
struct vgic_v2_cpu_if { diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index a8cf669..5269ad1 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -651,6 +651,38 @@ static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu *vcpu, return vgic_handle_cfg_reg(reg, mmio, offset); }
+/* We don't trigger any actions here, just store the register value */ +static bool handle_mmio_propbaser_redist(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + struct vgic_dist *dist = &vcpu->kvm->arch.vgic; + int mode = ACCESS_READ_VALUE; + + /* Storing a value with LPIs already enabled is undefined */ + mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE; + vgic_handle_base_register(vcpu, mmio, offset, &dist->propbaser, mode); + + return false; +} + +/* We don't trigger any actions here, just store the register value */ +static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + struct kvm_vcpu *rdvcpu = mmio->private; + struct vgic_dist *dist = &vcpu->kvm->arch.vgic; + int mode = ACCESS_READ_VALUE; + + /* Storing a value with LPIs already enabled is undefined */ + mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE; + vgic_handle_base_register(vcpu, mmio, offset, + &dist->pendbaser[rdvcpu->vcpu_id], mode); + + return false; +} + #define SGI_base(x) ((x) + SZ_64K)
static const struct vgic_io_range vgic_redist_ranges[] = { @@ -679,6 +711,18 @@ static const struct vgic_io_range vgic_redist_ranges[] = { .handle_mmio = handle_mmio_raz_wi, }, { + .base = GICR_PENDBASER, + .len = 0x08, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_pendbaser_redist, + }, + { + .base = GICR_PROPBASER, + .len = 0x08, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_propbaser_redist, + }, + { .base = GICR_IDREGS, .len = 0x30, .bits_per_irq = 0, diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 15e447f..49ee92b 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -446,6 +446,41 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg, } }
+/* handle a 64-bit register access */ +void vgic_handle_base_register(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset, u64 *basereg, + int mode) +{ + u32 reg; + u64 breg; + + switch (offset & ~3) { + case 0x00: + breg = *basereg; + reg = lower_32_bits(breg); + vgic_reg_access(mmio, ®, offset & 3, mode); + if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) { + breg &= GENMASK_ULL(63, 32); + breg |= reg; + *basereg = breg; + } + break; + case 0x04: + breg = *basereg; + reg = upper_32_bits(breg); + vgic_reg_access(mmio, ®, offset & 3, mode); + if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) { + breg = lower_32_bits(breg); + breg |= (u64)reg << 32; + *basereg = breg; + } + break; + } +} + + + bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h index a093f5c..b2d791c 100644 --- a/virt/kvm/arm/vgic.h +++ b/virt/kvm/arm/vgic.h @@ -71,6 +71,10 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg, phys_addr_t offset, int mode); bool handle_mmio_raz_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset); +void vgic_handle_base_register(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset, u64 *basereg, + int mode);
static inline u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask)
From: Andre Przywara andre.przywara@arm.com
The ARM GICv3 ITS emulation code goes into a separate file, but needs to be connected to the GICv3 emulation, of which it is an option. Introduce the skeleton with function stubs to be filled later. Introduce the basic ITS data structure and initialize it, but don't return any success yet, as we are not yet ready for the show.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kvm/Makefile | 1 + include/kvm/arm_vgic.h | 6 ++ include/linux/irqchip/arm-gic-v3.h | 1 + virt/kvm/arm/its-emul.c | 125 +++++++++++++++++++++++++++++++++++++ virt/kvm/arm/its-emul.h | 35 +++++++++++ virt/kvm/arm/vgic-v3-emul.c | 24 ++++++- 6 files changed, 189 insertions(+), 3 deletions(-) create mode 100644 virt/kvm/arm/its-emul.c create mode 100644 virt/kvm/arm/its-emul.h
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index f90f4aa..9803307 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -25,5 +25,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2-emul.o kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v2-switch.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/its-emul.o kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 8c6cb0e..9e9d4aa 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -156,6 +156,11 @@ struct vgic_io_device { struct kvm_io_device dev; };
+struct vgic_its { + bool enabled; + spinlock_t lock; +}; + struct vgic_dist { spinlock_t lock; bool in_kernel; @@ -264,6 +269,7 @@ struct vgic_dist { u64 *pendbaser;
bool lpis_enabled; + struct vgic_its its; };
struct vgic_v2_cpu_if { diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 6949da7..24d156d 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -177,6 +177,7 @@ #define GITS_CWRITER 0x0088 #define GITS_CREADR 0x0090 #define GITS_BASER 0x0100 +#define GITS_IDREGS_BASE 0xffd0 #define GITS_PIDR2 GICR_PIDR2
#define GITS_TRANSLATER 0x10040 diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c new file mode 100644 index 0000000..659dd39 --- /dev/null +++ b/virt/kvm/arm/its-emul.c @@ -0,0 +1,125 @@ +/* + * GICv3 ITS emulation + * + * Copyright (C) 2015 ARM Ltd. + * Author: Andre Przywara andre.przywara@arm.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/cpu.h> +#include <linux/kvm.h> +#include <linux/kvm_host.h> +#include <linux/interrupt.h> + +#include <linux/irqchip/arm-gic-v3.h> +#include <kvm/arm_vgic.h> + +#include <asm/kvm_emulate.h> +#include <asm/kvm_arm.h> +#include <asm/kvm_mmu.h> + +#include "vgic.h" +#include "its-emul.h" + +static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_gits_cwriter(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static bool handle_mmio_gits_creadr(struct kvm_vcpu *vcpu, + struct kvm_exit_mmio *mmio, + phys_addr_t offset) +{ + return false; +} + +static const struct vgic_io_range vgicv3_its_ranges[] = { + { + .base = GITS_CTLR, + .len = 0x10, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_misc_gits, + }, + { + .base = GITS_CBASER, + .len = 0x08, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_gits_cbaser, + }, + { + .base = GITS_CWRITER, + .len = 0x08, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_gits_cwriter, + }, + { + .base = GITS_CREADR, + .len = 0x08, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_gits_creadr, + }, + { + /* We don't need any memory from the guest. */ + .base = GITS_BASER, + .len = 0x40, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_raz_wi, + }, + { + .base = GITS_IDREGS_BASE, + .len = 0x30, + .bits_per_irq = 0, + .handle_mmio = handle_mmio_gits_idregs, + }, +}; + +/* This is called on setting the LPI enable bit in the redistributor. */ +void vgic_enable_lpis(struct kvm_vcpu *vcpu) +{ +} + +int vits_init(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_its *its = &dist->its; + + spin_lock_init(&its->lock); + + its->enabled = false; + + return -ENXIO; +} diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h new file mode 100644 index 0000000..5dc8e2f --- /dev/null +++ b/virt/kvm/arm/its-emul.h @@ -0,0 +1,35 @@ +/* + * GICv3 ITS emulation definitions + * + * Copyright (C) 2015 ARM Ltd. + * Author: Andre Przywara andre.przywara@arm.com + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __KVM_ITS_EMUL_H__ +#define __KVM_ITS_EMUL_H__ + +#include <linux/kvm.h> +#include <linux/kvm_host.h> + +#include <asm/kvm_emulate.h> +#include <asm/kvm_arm.h> +#include <asm/kvm_mmu.h> + +#include "vgic.h" + +void vgic_enable_lpis(struct kvm_vcpu *vcpu); +int vits_init(struct kvm *kvm); + +#endif diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 5269ad1..f5865e7 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -48,6 +48,7 @@ #include <asm/kvm_mmu.h>
#include "vgic.h" +#include "its-emul.h"
static bool handle_mmio_rao_wi(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) @@ -530,9 +531,20 @@ static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { - /* since we don't support LPIs, this register is zero for now */ - vgic_reg_access(mmio, NULL, offset, - ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + struct vgic_dist *dist = &vcpu->kvm->arch.vgic; + u32 reg; + + if (!vgic_has_its(vcpu->kvm)) { + vgic_reg_access(mmio, NULL, offset, + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + return false; + } + reg = dist->lpis_enabled ? GICR_CTLR_ENABLE_LPIS : 0; + vgic_reg_access(mmio, ®, offset, + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); + if (!dist->lpis_enabled && (reg & GICR_CTLR_ENABLE_LPIS)) { + /* Eventually do something */ + } return false; }
@@ -861,6 +873,12 @@ static int vgic_v3_map_resources(struct kvm *kvm, rdbase += GIC_V3_REDIST_SIZE; }
+ if (vgic_has_its(kvm)) { + ret = vits_init(kvm); + if (ret) + goto out_unregister; + } + dist->redist_iodevs = iodevs; dist->ready = true; goto out;
From: Andre Przywara andre.przywara@arm.com
Add emulation for some basic MMIO registers used in the ITS emulation. This includes: - GITS_{CTLR,TYPER,IIDR} - ID registers - GITS_{CBASER,CREADR,CWRITER} those implement the ITS command buffer handling
Most of the handlers are pretty straight forward, but CWRITER goes some extra miles to allow fine grained locking. The idea here is to let only the first instance iterate through the command ring buffer, CWRITER accesses on other VCPUs meanwhile will be picked up by that first instance and handled as well. The ITS lock is thus only hold for very small periods of time and is dropped before the actual command handler is called.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 3 + include/linux/irqchip/arm-gic-v3.h | 8 ++ virt/kvm/arm/its-emul.c | 205 +++++++++++++++++++++++++++++++++++++ virt/kvm/arm/its-emul.h | 1 + virt/kvm/arm/vgic-v3-emul.c | 2 + 5 files changed, 219 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 9e9d4aa..b432055 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -159,6 +159,9 @@ struct vgic_io_device { struct vgic_its { bool enabled; spinlock_t lock; + u64 cbaser; + int creadr; + int cwriter; };
struct vgic_dist { diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 24d156d..9d173d7 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -179,15 +179,23 @@ #define GITS_BASER 0x0100 #define GITS_IDREGS_BASE 0xffd0 #define GITS_PIDR2 GICR_PIDR2 +#define GITS_PIDR4 0xffd0 +#define GITS_CIDR0 0xfff0 +#define GITS_CIDR1 0xfff4 +#define GITS_CIDR2 0xfff8 +#define GITS_CIDR3 0xfffc
#define GITS_TRANSLATER 0x10040
#define GITS_CTLR_ENABLE (1U << 0) #define GITS_CTLR_QUIESCENT (1U << 31)
+#define GITS_TYPER_PLPIS (1UL << 0) +#define GITS_TYPER_IDBITS_SHIFT 8 #define GITS_TYPER_DEVBITS_SHIFT 13 #define GITS_TYPER_DEVBITS(r) ((((r) >> GITS_TYPER_DEVBITS_SHIFT) & 0x1f) + 1) #define GITS_TYPER_PTA (1UL << 19) +#define GITS_TYPER_HWCOLLCNT_SHIFT 24
#define GITS_CBASER_VALID (1UL << 63) #define GITS_CBASER_nCnB (0UL << 59) diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 659dd39..b498f06 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -32,10 +32,62 @@ #include "vgic.h" #include "its-emul.h"
+#define BASER_BASE_ADDRESS(x) ((x) & 0xfffffffff000ULL) + +/* The distributor lock is held by the VGIC MMIO handler. */ static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + u32 reg; + bool was_enabled; + + switch (offset & ~3) { + case 0x00: /* GITS_CTLR */ + /* We never defer any command execution. */ + reg = GITS_CTLR_QUIESCENT; + if (its->enabled) + reg |= GITS_CTLR_ENABLE; + was_enabled = its->enabled; + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); + its->enabled = !!(reg & GITS_CTLR_ENABLE); + return !was_enabled && its->enabled; + case 0x04: /* GITS_IIDR */ + reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0); + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); + break; + case 0x08: /* GITS_TYPER */ + /* + * We use linear CPU numbers for redistributor addressing, + * so GITS_TYPER.PTA is 0. + * To avoid memory waste on the guest side, we keep the + * number of IDBits and DevBits low for the time being. + * This could later be made configurable by userland. + * Since we have all collections in linked list, we claim + * that we can hold all of the collection tables in our + * own memory and that the ITT entry size is 1 byte (the + * smallest possible one). + */ + reg = GITS_TYPER_PLPIS; + reg |= 0xff << GITS_TYPER_HWCOLLCNT_SHIFT; + reg |= 0x0f << GITS_TYPER_DEVBITS_SHIFT; + reg |= 0x0f << GITS_TYPER_IDBITS_SHIFT; + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); + break; + case 0x0c: + /* The upper 32bits of TYPER are all 0 for the time being. + * Should we need more than 256 collections, we can enable + * some bits in here. + */ + vgic_reg_access(mmio, NULL, offset & 3, + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + break; + } + return false; }
@@ -43,20 +95,142 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + u32 reg = 0; + int idreg = (offset & ~3) + GITS_IDREGS_BASE; + + switch (idreg) { + case GITS_PIDR2: + reg = GIC_PIDR2_ARCH_GICv3; + break; + case GITS_PIDR4: + /* This is a 64K software visible page */ + reg = 0x40; + break; + /* Those are the ID registers for (any) GIC. */ + case GITS_CIDR0: + reg = 0x0d; + break; + case GITS_CIDR1: + reg = 0xf0; + break; + case GITS_CIDR2: + reg = 0x05; + break; + case GITS_CIDR3: + reg = 0xb1; + break; + } + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); return false; }
+static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) +{ + return -ENODEV; +} + static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + int mode = ACCESS_READ_VALUE; + + mode |= its->enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE; + + vgic_handle_base_register(vcpu, mmio, offset, &its->cbaser, mode); + + /* Writing CBASER resets the read pointer. */ + if (mmio->is_write) + its->creadr = 0; + return false; }
+static int its_cmd_buffer_size(struct kvm *kvm) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + + return ((its->cbaser & 0xff) + 1) << 12; +} + +static gpa_t its_cmd_buffer_base(struct kvm *kvm) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + + return BASER_BASE_ADDRESS(its->cbaser); +} + +/* + * By writing to CWRITER the guest announces new commands to be processed. + * Since we cannot read from guest memory inside the ITS spinlock, we + * iterate over the command buffer (with the lock dropped) until the read + * pointer matches the write pointer. Other VCPUs writing this register in the + * meantime will just update the write pointer, leaving the command + * processing to the first instance of the function. + */ static bool handle_mmio_gits_cwriter(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + gpa_t cbaser = its_cmd_buffer_base(vcpu->kvm); + u64 cmd_buf[4]; + u32 reg; + bool finished; + + /* The upper 32 bits are RES0 */ + if ((offset & ~3) == 0x04) { + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + return false; + } + + reg = its->cwriter & 0xfffe0; + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); + if (!mmio->is_write) + return false; + + reg &= 0xfffe0; + if (reg > its_cmd_buffer_size(vcpu->kvm)) + return false; + + spin_lock(&its->lock); + + /* + * If there is still another VCPU handling commands, let this + * one pick up the new CWRITER and process our new commands as well. + */ + finished = (its->cwriter != its->creadr); + its->cwriter = reg; + + spin_unlock(&its->lock); + + while (!finished) { + int ret = kvm_read_guest(vcpu->kvm, cbaser + its->creadr, + cmd_buf, 32); + if (ret) { + /* + * Gah, we are screwed. Reset CWRITER to that command + * that we have finished processing and return. + */ + spin_lock(&its->lock); + its->cwriter = its->creadr; + spin_unlock(&its->lock); + break; + } + vits_handle_command(vcpu, cmd_buf); + + spin_lock(&its->lock); + its->creadr += 32; + if (its->creadr == its_cmd_buffer_size(vcpu->kvm)) + its->creadr = 0; + finished = (its->creadr == its->cwriter); + spin_unlock(&its->lock); + } + return false; }
@@ -64,6 +238,20 @@ static bool handle_mmio_gits_creadr(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + u32 reg; + + switch (offset & ~3) { + case 0x00: + reg = its->creadr & 0xfffe0; + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); + break; + case 0x04: + vgic_reg_access(mmio, ®, offset & 3, + ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED); + break; + } return false; }
@@ -117,9 +305,26 @@ int vits_init(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_its *its = &dist->its;
+ dist->pendbaser = kmalloc(sizeof(u64) * dist->nr_cpus, GFP_KERNEL); + if (!dist->pendbaser) + return -ENOMEM; + spin_lock_init(&its->lock);
its->enabled = false;
return -ENXIO; } + +void vits_destroy(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_its *its = &dist->its; + + if (!vgic_has_its(kvm)) + return; + + kfree(dist->pendbaser); + + its->enabled = false; +} diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index 5dc8e2f..472a6d0 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -31,5 +31,6 @@
void vgic_enable_lpis(struct kvm_vcpu *vcpu); int vits_init(struct kvm *kvm); +void vits_destroy(struct kvm *kvm);
#endif diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index f5865e7..49be3c3 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -928,6 +928,8 @@ static void vgic_v3_destroy_model(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic;
+ vits_destroy(kvm); + kfree(dist->irq_spi_mpidr); dist->irq_spi_mpidr = NULL; }
From: Andre Przywara andre.przywara@arm.com
The GICv3 Interrupt Translation Service (ITS) uses tables in memory to allow a sophisticated interrupt routing. It features device tables, an interrupt table per device and a table connecting "collections" to actual CPUs (aka. redistributors in the GICv3 lingo). Since the interrupt numbers for the LPIs are allocated quite sparsely and the range can be quite huge (8192 LPIs being the minimum), using bitmaps or arrays for storing information is a waste of memory. We use linked lists instead, which we iterate linearily. This works very well with the actual number of LPIs/MSIs in the guest being quite low. Should the number of LPIs exceed the number where iterating through lists seems acceptable, we can later revisit this and use more efficient data structures.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 3 +++ virt/kvm/arm/its-emul.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index b432055..1648668 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -25,6 +25,7 @@ #include <linux/spinlock.h> #include <linux/types.h> #include <kvm/iodev.h> +#include <linux/list.h>
#define VGIC_NR_IRQS_LEGACY 256 #define VGIC_NR_SGIS 16 @@ -162,6 +163,8 @@ struct vgic_its { u64 cbaser; int creadr; int cwriter; + struct list_head device_list; + struct list_head collection_list; };
struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b498f06..7f217fa 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -21,6 +21,7 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <linux/interrupt.h> +#include <linux/list.h>
#include <linux/irqchip/arm-gic-v3.h> #include <kvm/arm_vgic.h> @@ -32,6 +33,25 @@ #include "vgic.h" #include "its-emul.h"
+struct its_device { + struct list_head dev_list; + struct list_head itt; + u32 device_id; +}; + +struct its_collection { + struct list_head coll_list; + u32 collection_id; + u32 target_addr; +}; + +struct its_itte { + struct list_head itte_list; + struct its_collection *collection; + u32 lpi; + u32 event_id; +}; + #define BASER_BASE_ADDRESS(x) ((x) & 0xfffffffff000ULL)
/* The distributor lock is held by the VGIC MMIO handler. */ @@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm)
spin_lock_init(&its->lock);
+ INIT_LIST_HEAD(&its->device_list); + INIT_LIST_HEAD(&its->collection_list); + its->enabled = false;
return -ENXIO; @@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_its *its = &dist->its; + struct its_device *dev; + struct its_itte *itte; + struct list_head *dev_cur, *dev_temp; + struct list_head *cur, *temp;
if (!vgic_has_its(kvm)) return;
+ if (!its->device_list.next) + return; + + spin_lock(&its->lock); + list_for_each_safe(dev_cur, dev_temp, &its->device_list) { + dev = container_of(dev_cur, struct its_device, dev_list); + list_for_each_safe(cur, temp, &dev->itt) { + itte = (container_of(cur, struct its_itte, itte_list)); + list_del(cur); + kfree(itte); + } + list_del(dev_cur); + kfree(dev); + } + + list_for_each_safe(cur, temp, &its->collection_list) { + list_del(cur); + kfree(container_of(cur, struct its_collection, coll_list)); + } + kfree(dist->pendbaser);
its->enabled = false; + spin_unlock(&its->lock); }
From: Andre Przywara andre.przywara@arm.com
As the actual LPI number in a guest can be quite high, but is mostly assigned using a very sparse allocation scheme, bitmaps and arrays for storing the virtual interrupt status are a waste of memory. We use our equivalent of the "Interrupt Translation Table Entry" (ITTE) to hold this extra status information for a virtual LPI. As the normal VGIC code cannot use it's fancy bitmaps to manage pending interrupts, we provide a hook in the VGIC code to let the ITS emulation handle the list register queueing itself. LPIs are located in a separate number range (>=8192), so distinguishing them is easy. With LPIs being only edge-triggered, we get away with a less complex IRQ handling.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 2 ++ virt/kvm/arm/its-emul.c | 71 ++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/its-emul.h | 3 ++ virt/kvm/arm/vgic-v3-emul.c | 2 ++ virt/kvm/arm/vgic.c | 72 ++++++++++++++++++++++++++++++++++----------- 5 files changed, 133 insertions(+), 17 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 1648668..2a67a10 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -147,6 +147,8 @@ struct vgic_vm_ops { int (*init_model)(struct kvm *); void (*destroy_model)(struct kvm *); int (*map_resources)(struct kvm *, const struct vgic_params *); + bool (*queue_lpis)(struct kvm_vcpu *); + void (*unqueue_lpi)(struct kvm_vcpu *, int irq); };
struct vgic_io_device { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 7f217fa..b9c40d7 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -50,8 +50,26 @@ struct its_itte { struct its_collection *collection; u32 lpi; u32 event_id; + bool enabled; + unsigned long *pending; };
+#define for_each_lpi(dev, itte, kvm) \ + list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \ + list_for_each_entry(itte, &(dev)->itt, itte_list) + +static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) +{ + struct its_device *device; + struct its_itte *itte; + + for_each_lpi(device, itte, kvm) { + if (itte->lpi == lpi) + return itte; + } + return NULL; +} + #define BASER_BASE_ADDRESS(x) ((x) & 0xfffffffff000ULL)
/* The distributor lock is held by the VGIC MMIO handler. */ @@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, return false; }
+/* + * Find all enabled and pending LPIs and queue them into the list + * registers. + * The dist lock is held by the caller. + */ +bool vits_queue_lpis(struct kvm_vcpu *vcpu) +{ + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + struct its_device *device; + struct its_itte *itte; + bool ret = true; + + if (!vgic_has_its(vcpu->kvm)) + return true; + if (!its->enabled || !vcpu->kvm->arch.vgic.lpis_enabled) + return true; + + spin_lock(&its->lock); + for_each_lpi(device, itte, vcpu->kvm) { + if (!itte->enabled || !test_bit(vcpu->vcpu_id, itte->pending)) + continue; + + if (!itte->collection) + continue; + + if (itte->collection->target_addr != vcpu->vcpu_id) + continue; + + __clear_bit(vcpu->vcpu_id, itte->pending); + + ret &= vgic_queue_irq(vcpu, 0, itte->lpi); + } + + spin_unlock(&its->lock); + return ret; +} + +/* Called with the distributor lock held by the caller. */ +void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi) +{ + struct vgic_its *its = &vcpu->kvm->arch.vgic.its; + struct its_itte *itte; + + spin_lock(&its->lock); + + /* Find the right ITTE and put the pending state back in there */ + itte = find_itte_by_lpi(vcpu->kvm, lpi); + if (itte) + __set_bit(vcpu->vcpu_id, itte->pending); + + spin_unlock(&its->lock); +} + static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) { return -ENODEV; diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index 472a6d0..cc5d5ff 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -33,4 +33,7 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu); int vits_init(struct kvm *kvm); void vits_destroy(struct kvm *kvm);
+bool vits_queue_lpis(struct kvm_vcpu *vcpu); +void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq); + #endif diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 49be3c3..4132c26 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -948,6 +948,8 @@ void vgic_v3_init_emulation(struct kvm *kvm) dist->vm_ops.init_model = vgic_v3_init_model; dist->vm_ops.destroy_model = vgic_v3_destroy_model; dist->vm_ops.map_resources = vgic_v3_map_resources; + dist->vm_ops.queue_lpis = vits_queue_lpis; + dist->vm_ops.unqueue_lpi = vits_unqueue_lpi;
dist->vgic_dist_base = VGIC_ADDR_UNDEF; dist->vgic_redist_base = VGIC_ADDR_UNDEF; diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 49ee92b..9dfd094 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -95,6 +95,20 @@ static bool queue_sgi(struct kvm_vcpu *vcpu, int irq) return vcpu->kvm->arch.vgic.vm_ops.queue_sgi(vcpu, irq); }
+static bool vgic_queue_lpis(struct kvm_vcpu *vcpu) +{ + if (vcpu->kvm->arch.vgic.vm_ops.queue_lpis) + return vcpu->kvm->arch.vgic.vm_ops.queue_lpis(vcpu); + else + return true; +} + +static void vgic_unqueue_lpi(struct kvm_vcpu *vcpu, int irq) +{ + if (vcpu->kvm->arch.vgic.vm_ops.unqueue_lpi) + vcpu->kvm->arch.vgic.vm_ops.unqueue_lpi(vcpu, irq); +} + int kvm_vgic_map_resources(struct kvm *kvm) { return kvm->arch.vgic.vm_ops.map_resources(kvm, vgic); @@ -1135,6 +1149,10 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) { vlr = vgic_get_lr(vcpu, lr);
+ /* We don't care about LPIs here */ + if (vlr.irq >= 8192) + continue; + if (!vgic_irq_is_enabled(vcpu, vlr.irq)) { vlr.state = 0; vgic_set_lr(vcpu, lr, vlr); @@ -1147,25 +1165,33 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, int lr_nr, int sgi_source_id) { + struct vgic_dist *dist = &vcpu->kvm->arch.vgic; struct vgic_lr vlr;
vlr.state = 0; vlr.irq = irq; vlr.source = sgi_source_id;
- if (vgic_irq_is_active(vcpu, irq)) { - vlr.state |= LR_STATE_ACTIVE; - kvm_debug("Set active, clear distributor: 0x%x\n", vlr.state); - vgic_irq_clear_active(vcpu, irq); - vgic_update_state(vcpu->kvm); - } else if (vgic_dist_irq_is_pending(vcpu, irq)) { - vlr.state |= LR_STATE_PENDING; - kvm_debug("Set pending: 0x%x\n", vlr.state); - } - - if (!vgic_irq_is_edge(vcpu, irq)) - vlr.state |= LR_EOI_INT; + /* We care only about state for SGIs/PPIs/SPIs, not for LPIs */ + if (irq < dist->nr_irqs) { + if (vgic_irq_is_active(vcpu, irq)) { + vlr.state |= LR_STATE_ACTIVE; + kvm_debug("Set active, clear distributor: 0x%x\n", + vlr.state); + vgic_irq_clear_active(vcpu, irq); + vgic_update_state(vcpu->kvm); + } else if (vgic_dist_irq_is_pending(vcpu, irq)) { + vlr.state |= LR_STATE_PENDING; + kvm_debug("Set pending: 0x%x\n", vlr.state); + }
+ if (!vgic_irq_is_edge(vcpu, irq)) + vlr.state |= LR_EOI_INT; + } else { + /* If this is an LPI, it can only be pending */ + if (irq >= 8192) + vlr.state |= LR_STATE_PENDING; + } vgic_set_lr(vcpu, lr_nr, vlr); vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); } @@ -1177,7 +1203,6 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, */ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq) { - struct vgic_dist *dist = &vcpu->kvm->arch.vgic; u64 elrsr = vgic_get_elrsr(vcpu); unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr); int lr; @@ -1185,7 +1210,6 @@ bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq) /* Sanitize the input... */ BUG_ON(sgi_source_id & ~7); BUG_ON(sgi_source_id && irq >= VGIC_NR_SGIS); - BUG_ON(irq >= dist->nr_irqs);
kvm_debug("Queue IRQ%d\n", irq);
@@ -1265,8 +1289,12 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) overflow = 1; }
- - + /* + * LPIs are not mapped in our bitmaps, so we leave the iteration + * to the ITS emulation code. + */ + if (!vgic_queue_lpis(vcpu)) + overflow = 1;
epilog: if (overflow) { @@ -1387,6 +1415,16 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) for_each_clear_bit(lr_nr, elrsr_ptr, vgic_cpu->nr_lr) { vlr = vgic_get_lr(vcpu, lr_nr);
+ /* LPIs are handled separately */ + if (vlr.irq >= 8192) { + /* We just need to take care about still pending LPIs */ + if (vlr.state & LR_STATE_PENDING) { + vgic_unqueue_lpi(vcpu, vlr.irq); + pending = true; + } + continue; + } + BUG_ON(!(vlr.state & LR_STATE_MASK)); pending = true;
@@ -1411,7 +1449,7 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) } vgic_update_state(vcpu->kvm);
- /* vgic_update_state would not cover only-active IRQs */ + /* vgic_update_state would not cover only-active IRQs or LPIs */ if (pending) set_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu); }
From: Andre Przywara andre.przywara@arm.com
The LPI configuration and pending tables of the GICv3 LPIs are held in tables in (guest) memory. To achieve reasonable performance, we cache this data in our own data structures, so we need to sync those two views from time to time. This behaviour is well described in the GICv3 spec and is also exercised by hardware, so the sync points are well known.
Provide functions that read the guest memory and store the information from the configuration and pending tables in the kernel.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 2 + virt/kvm/arm/its-emul.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/its-emul.h | 3 ++ 3 files changed, 129 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 2a67a10..323c33a 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -167,6 +167,8 @@ struct vgic_its { int cwriter; struct list_head device_list; struct list_head collection_list; + /* memory used for buffering guest's memory */ + void *buffer_page; };
struct vgic_dist { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index b9c40d7..05245cb 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -50,6 +50,7 @@ struct its_itte { struct its_collection *collection; u32 lpi; u32 event_id; + u8 priority; bool enabled; unsigned long *pending; }; @@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) return NULL; }
+#define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED) +#define LPI_PROP_PRIORITY(p) ((p) & 0xfc) + +/* stores the priority and enable bit for a given LPI */ +static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop) +{ + itte->priority = LPI_PROP_PRIORITY(prop); + itte->enabled = LPI_PROP_ENABLE_BIT(prop); +} + +#define GIC_LPI_OFFSET 8192 + +/* We scan the table in chunks the size of the smallest page size */ +#define CHUNK_SIZE 4096U + #define BASER_BASE_ADDRESS(x) ((x) & 0xfffffffff000ULL)
+static int nr_idbits_propbase(u64 propbaser) +{ + int nr_idbits = (1U << (propbaser & 0x1f)) + 1; + + return max(nr_idbits, INTERRUPT_ID_BITS_ITS); +} + +/* + * Scan the whole LPI configuration table and put the LPI configuration + * data in our own data structures. This relies on the LPI being + * mapped before. + */ +static bool its_update_lpis_configuration(struct kvm *kvm) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + u8 *prop = dist->its.buffer_page; + u32 tsize; + gpa_t propbase; + int lpi = GIC_LPI_OFFSET; + struct its_itte *itte; + struct its_device *device; + int ret; + + propbase = BASER_BASE_ADDRESS(dist->propbaser); + tsize = nr_idbits_propbase(dist->propbaser); + + while (tsize > 0) { + int chunksize = min(tsize, CHUNK_SIZE); + + ret = kvm_read_guest(kvm, propbase, prop, chunksize); + if (ret) + return false; + + spin_lock(&dist->its.lock); + /* + * Updating the status for all allocated LPIs. We catch + * those LPIs that get disabled. We really don't care + * about unmapped LPIs, as they need to be updated + * later manually anyway once they get mapped. + */ + for_each_lpi(device, itte, kvm) { + if (itte->lpi < lpi || itte->lpi >= lpi + chunksize) + continue; + + update_lpi_config(kvm, itte, prop[itte->lpi - lpi]); + } + spin_unlock(&dist->its.lock); + tsize -= chunksize; + lpi += chunksize; + propbase += chunksize; + } + + return true; +} + +/* + * Scan the whole LPI pending table and sync the pending bit in there + * with our own data structures. This relies on the LPI being + * mapped before. + */ +static bool its_sync_lpi_pending_table(struct kvm_vcpu *vcpu) +{ + struct vgic_dist *dist = &vcpu->kvm->arch.vgic; + unsigned long *pendmask = dist->its.buffer_page; + u32 nr_lpis = VITS_NR_LPIS; + gpa_t pendbase; + int lpi = 0; + struct its_itte *itte; + struct its_device *device; + int ret; + int lpi_bit, nr_bits; + + pendbase = BASER_BASE_ADDRESS(dist->pendbaser[vcpu->vcpu_id]); + + while (nr_lpis > 0) { + nr_bits = min(nr_lpis, CHUNK_SIZE * 8); + + ret = kvm_read_guest(vcpu->kvm, pendbase, pendmask, + nr_bits / 8); + if (ret) + return false; + + spin_lock(&dist->its.lock); + for_each_lpi(device, itte, vcpu->kvm) { + lpi_bit = itte->lpi - lpi; + if (lpi_bit < 0 || lpi_bit >= nr_bits) + continue; + if (test_bit(lpi_bit, pendmask)) + __set_bit(vcpu->vcpu_id, itte->pending); + else + __clear_bit(vcpu->vcpu_id, itte->pending); + } + spin_unlock(&dist->its.lock); + nr_lpis -= nr_bits; + lpi += nr_bits; + pendbase += nr_bits / 8; + } + + return true; +} + /* The distributor lock is held by the VGIC MMIO handler. */ static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, @@ -389,6 +506,8 @@ static const struct vgic_io_range vgicv3_its_ranges[] = { /* This is called on setting the LPI enable bit in the redistributor. */ void vgic_enable_lpis(struct kvm_vcpu *vcpu) { + its_update_lpis_configuration(vcpu->kvm); + its_sync_lpi_pending_table(vcpu); }
int vits_init(struct kvm *kvm) @@ -400,6 +519,10 @@ int vits_init(struct kvm *kvm) if (!dist->pendbaser) return -ENOMEM;
+ its->buffer_page = kmalloc(CHUNK_SIZE, GFP_KERNEL); + if (!its->buffer_page) + return -ENOMEM; + spin_lock_init(&its->lock);
INIT_LIST_HEAD(&its->device_list); @@ -442,6 +565,7 @@ void vits_destroy(struct kvm *kvm) kfree(container_of(cur, struct its_collection, coll_list)); }
+ kfree(its->buffer_page); kfree(dist->pendbaser);
its->enabled = false; diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index cc5d5ff..cbc3877 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -29,6 +29,9 @@
#include "vgic.h"
+#define INTERRUPT_ID_BITS_ITS 16 +#define VITS_NR_LPIS (1U << INTERRUPT_ID_BITS_ITS) + void vgic_enable_lpis(struct kvm_vcpu *vcpu); int vits_init(struct kvm *kvm); void vits_destroy(struct kvm *kvm);
From: Andre Przywara andre.przywara@arm.com
The connection between a device, an event ID, the LPI number and the allocated CPU is stored in in-memory tables in a GICv3, but their format is not specified by the spec. Instead software uses a command queue in a ring buffer to let the ITS implementation use their own format. Implement handlers for the various ITS commands and let them store the requested relation into our own data structures. To avoid kmallocs inside the ITS spinlock, we preallocate possibly needed memory outside of the lock and free that if it turns out to be not needed (mostly error handling). Error handling is very basic at this point, as we don't have a good way of communicating errors to the guest (usually a SError). The INT command handler is missing at this point, as we gain the capability of actually injecting MSIs into the guest only later on.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/linux/irqchip/arm-gic-v3.h | 5 +- virt/kvm/arm/its-emul.c | 497 ++++++++++++++++++++++++++++++++++++- virt/kvm/arm/its-emul.h | 11 + 3 files changed, 511 insertions(+), 2 deletions(-)
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 9d173d7..5bbd47c 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -254,7 +254,10 @@ */ #define GITS_CMD_MAPD 0x08 #define GITS_CMD_MAPC 0x09 -#define GITS_CMD_MAPVI 0x0a +#define GITS_CMD_MAPTI 0x0a +/* older GIC documentation used MAPVI for this command */ +#define GITS_CMD_MAPVI GITS_CMD_MAPTI +#define GITS_CMD_MAPI 0x0b #define GITS_CMD_MOVI 0x01 #define GITS_CMD_DISCARD 0x0f #define GITS_CMD_INV 0x0c diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 05245cb..89534c6 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -22,6 +22,7 @@ #include <linux/kvm_host.h> #include <linux/interrupt.h> #include <linux/list.h> +#include <linux/slab.h>
#include <linux/irqchip/arm-gic-v3.h> #include <kvm/arm_vgic.h> @@ -55,6 +56,34 @@ struct its_itte { unsigned long *pending; };
+static struct its_device *find_its_device(struct kvm *kvm, u32 device_id) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + struct its_device *device; + + list_for_each_entry(device, &its->device_list, dev_list) + if (device_id == device->device_id) + return device; + + return NULL; +} + +static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 event_id) +{ + struct its_device *device; + struct its_itte *itte; + + device = find_its_device(kvm, device_id); + if (device == NULL) + return NULL; + + list_for_each_entry(itte, &device->itt, itte_list) + if (itte->event_id == event_id) + return itte; + + return NULL; +} + #define for_each_lpi(dev, itte, kvm) \ list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \ list_for_each_entry(itte, &(dev)->itt, itte_list) @@ -71,6 +100,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi) return NULL; }
+static struct its_collection *find_collection(struct kvm *kvm, int coll_id) +{ + struct its_collection *collection; + + list_for_each_entry(collection, &kvm->arch.vgic.its.collection_list, + coll_list) { + if (coll_id == collection->collection_id) + return collection; + } + + return NULL; +} + #define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED) #define LPI_PROP_PRIORITY(p) ((p) & 0xfc)
@@ -333,9 +375,461 @@ void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi) spin_unlock(&its->lock); }
+static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size) +{ + return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1); +} + +#define its_cmd_get_command(cmd) its_cmd_mask_field(cmd, 0, 0, 8) +#define its_cmd_get_deviceid(cmd) its_cmd_mask_field(cmd, 0, 32, 32) +#define its_cmd_get_id(cmd) its_cmd_mask_field(cmd, 1, 0, 32) +#define its_cmd_get_physical_id(cmd) its_cmd_mask_field(cmd, 1, 32, 32) +#define its_cmd_get_collection(cmd) its_cmd_mask_field(cmd, 2, 0, 16) +#define its_cmd_get_target_addr(cmd) its_cmd_mask_field(cmd, 2, 16, 32) +#define its_cmd_get_validbit(cmd) its_cmd_mask_field(cmd, 2, 63, 1) + +/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */ +static int vits_cmd_handle_discard(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + u32 device_id; + u32 event_id; + struct its_itte *itte; + int ret = 0; + + device_id = its_cmd_get_deviceid(its_cmd); + event_id = its_cmd_get_id(its_cmd); + + spin_lock(&its->lock); + itte = find_itte(kvm, device_id, event_id); + if (!itte || !itte->collection) { + ret = E_ITS_DISCARD_UNMAPPED_INTERRUPT; + goto out_unlock; + } + + __clear_bit(itte->collection->target_addr, itte->pending); + + list_del(&itte->itte_list); + kfree(itte); +out_unlock: + spin_unlock(&its->lock); + return ret; +} + +/* The MOVI command moves an ITTE to a different collection. */ +static int vits_cmd_handle_movi(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + u32 device_id = its_cmd_get_deviceid(its_cmd); + u32 event_id = its_cmd_get_id(its_cmd); + u32 coll_id = its_cmd_get_collection(its_cmd); + struct its_itte *itte; + struct its_collection *collection; + int ret; + + spin_lock(&its->lock); + itte = find_itte(kvm, device_id, event_id); + if (!itte) { + ret = E_ITS_MOVI_UNMAPPED_INTERRUPT; + goto out_unlock; + } + if (!itte->collection) { + ret = E_ITS_MOVI_UNMAPPED_COLLECTION; + goto out_unlock; + } + + collection = find_collection(kvm, coll_id); + if (!collection) { + ret = E_ITS_MOVI_UNMAPPED_COLLECTION; + goto out_unlock; + } + + if (test_and_clear_bit(itte->collection->target_addr, itte->pending)) + __set_bit(collection->target_addr, itte->pending); + + itte->collection = collection; +out_unlock: + spin_unlock(&its->lock); + return ret; +} + +static void vits_init_collection(struct kvm *kvm, + struct its_collection *collection, + u32 coll_id) +{ + collection->collection_id = coll_id; + + list_add_tail(&collection->coll_list, + &kvm->arch.vgic.its.collection_list); +} + +/* The MAPTI and MAPI commands map LPIs to ITTEs. */ +static int vits_cmd_handle_mapi(struct kvm *kvm, u64 *its_cmd, u8 cmd) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + u32 device_id = its_cmd_get_deviceid(its_cmd); + u32 event_id = its_cmd_get_id(its_cmd); + u32 coll_id = its_cmd_get_collection(its_cmd); + struct its_itte *itte, *new_itte; + struct its_device *device; + struct its_collection *collection, *new_coll; + int lpi_nr; + int ret = 0; + + /* Preallocate possibly needed memory here outside of the lock */ + new_coll = kmalloc(sizeof(struct its_collection), GFP_KERNEL); + new_itte = kzalloc(sizeof(struct its_itte), GFP_KERNEL); + if (new_itte) + new_itte->pending = kcalloc(BITS_TO_LONGS(dist->nr_cpus), + sizeof(long), GFP_KERNEL); + + spin_lock(&dist->its.lock); + + device = find_its_device(kvm, device_id); + if (!device) { + ret = E_ITS_MAPTI_UNMAPPED_DEVICE; + goto out_unlock; + } + + collection = find_collection(kvm, coll_id); + if (!collection && !new_coll) { + ret = -ENOMEM; + goto out_unlock; + } + + if (cmd == GITS_CMD_MAPTI) + lpi_nr = its_cmd_get_physical_id(its_cmd); + else + lpi_nr = event_id; + if (lpi_nr < GIC_LPI_OFFSET || + lpi_nr >= nr_idbits_propbase(dist->propbaser)) { + ret = E_ITS_MAPTI_PHYSICALID_OOR; + goto out_unlock; + } + + itte = find_itte(kvm, device_id, event_id); + if (!itte) { + if (!new_itte || !new_itte->pending) { + ret = -ENOMEM; + goto out_unlock; + } + itte = new_itte; + + itte->event_id = event_id; + list_add_tail(&itte->itte_list, &device->itt); + } else { + if (new_itte) + kfree(new_itte->pending); + kfree(new_itte); + } + + if (!collection) { + collection = new_coll; + vits_init_collection(kvm, collection, coll_id); + } else { + kfree(new_coll); + } + + itte->collection = collection; + itte->lpi = lpi_nr; + +out_unlock: + spin_unlock(&dist->its.lock); + if (ret) { + kfree(new_coll); + if (new_itte) + kfree(new_itte->pending); + kfree(new_itte); + } + return ret; +} + +static void vits_unmap_device(struct kvm *kvm, struct its_device *device) +{ + struct its_itte *itte, *temp; + + /* + * The spec says that unmapping a device with still valid + * ITTEs associated is UNPREDICTABLE. We remove all ITTEs, + * since we cannot leave the memory unreferenced. + */ + list_for_each_entry_safe(itte, temp, &device->itt, itte_list) { + list_del(&itte->itte_list); + kfree(itte); + } + + list_del(&device->dev_list); + kfree(device); +} + +/* The MAPD command maps device IDs to Interrupt Translation Tables (ITTs). */ +static int vits_cmd_handle_mapd(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + bool valid = its_cmd_get_validbit(its_cmd); + u32 device_id = its_cmd_get_deviceid(its_cmd); + struct its_device *device, *new_device = NULL; + + /* We preallocate memory outside of the lock here */ + if (valid) { + new_device = kzalloc(sizeof(struct its_device), GFP_KERNEL); + if (!new_device) + return -ENOMEM; + } + + spin_lock(&its->lock); + + device = find_its_device(kvm, device_id); + if (device) + vits_unmap_device(kvm, device); + + /* + * The spec does not say whether unmapping a not-mapped device + * is an error, so we are done in any case. + */ + if (!valid) + goto out_unlock; + + device = new_device; + + device->device_id = device_id; + INIT_LIST_HEAD(&device->itt); + + list_add_tail(&device->dev_list, + &kvm->arch.vgic.its.device_list); + +out_unlock: + spin_unlock(&its->lock); + return 0; +} + +/* The MAPC command maps collection IDs to redistributors. */ +static int vits_cmd_handle_mapc(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + u16 coll_id; + u32 target_addr; + struct its_collection *collection, *new_coll = NULL; + bool valid; + + valid = its_cmd_get_validbit(its_cmd); + coll_id = its_cmd_get_collection(its_cmd); + target_addr = its_cmd_get_target_addr(its_cmd); + + if (target_addr >= atomic_read(&kvm->online_vcpus)) + return E_ITS_MAPC_PROCNUM_OOR; + + /* We preallocate memory outside of the lock here */ + if (valid) { + new_coll = kmalloc(sizeof(struct its_collection), GFP_KERNEL); + if (!new_coll) + return -ENOMEM; + } + + spin_lock(&its->lock); + collection = find_collection(kvm, coll_id); + + if (!valid) { + struct its_device *device; + struct its_itte *itte; + /* + * Clearing the mapping for that collection ID removes the + * entry from the list. If there wasn't any before, we can + * go home early. + */ + if (!collection) + goto out_unlock; + + for_each_lpi(device, itte, kvm) + if (itte->collection && + itte->collection->collection_id == coll_id) + itte->collection = NULL; + + list_del(&collection->coll_list); + kfree(collection); + } else { + if (!collection) + collection = new_coll; + else + kfree(new_coll); + + vits_init_collection(kvm, collection, coll_id); + collection->target_addr = target_addr; + } + +out_unlock: + spin_unlock(&its->lock); + return 0; +} + +/* The CLEAR command removes the pending state for a particular LPI. */ +static int vits_cmd_handle_clear(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + u32 device_id; + u32 event_id; + struct its_itte *itte; + int ret = 0; + + device_id = its_cmd_get_deviceid(its_cmd); + event_id = its_cmd_get_id(its_cmd); + + spin_lock(&its->lock); + + itte = find_itte(kvm, device_id, event_id); + if (!itte) { + ret = E_ITS_CLEAR_UNMAPPED_INTERRUPT; + goto out_unlock; + } + + if (itte->collection) + __clear_bit(itte->collection->target_addr, itte->pending); + +out_unlock: + spin_unlock(&its->lock); + return ret; +} + +/* The INV command syncs the pending bit from the memory tables. */ +static int vits_cmd_handle_inv(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + u32 device_id; + u32 event_id; + struct its_itte *itte, *new_itte; + gpa_t propbase; + int ret; + u8 prop; + + device_id = its_cmd_get_deviceid(its_cmd); + event_id = its_cmd_get_id(its_cmd); + + spin_lock(&dist->its.lock); + itte = find_itte(kvm, device_id, event_id); + spin_unlock(&dist->its.lock); + if (!itte) + return E_ITS_INV_UNMAPPED_INTERRUPT; + + /* + * We cannot read from guest memory inside the spinlock, so we + * need to re-read our tables to learn whether the LPI number we are + * using is still valid. + */ + do { + propbase = BASER_BASE_ADDRESS(dist->propbaser); + ret = kvm_read_guest(kvm, propbase + itte->lpi - GIC_LPI_OFFSET, + &prop, 1); + if (ret) + return ret; + + spin_lock(&dist->its.lock); + new_itte = find_itte(kvm, device_id, event_id); + if (new_itte->lpi != itte->lpi) { + itte = new_itte; + spin_unlock(&dist->its.lock); + continue; + } + update_lpi_config(kvm, itte, prop); + spin_unlock(&dist->its.lock); + } while (0); + return 0; +} + +/* The INVALL command requests flushing of all IRQ data in this collection. */ +static int vits_cmd_handle_invall(struct kvm *kvm, u64 *its_cmd) +{ + u32 coll_id = its_cmd_get_collection(its_cmd); + struct its_collection *collection; + struct kvm_vcpu *vcpu; + + collection = find_collection(kvm, coll_id); + if (!collection) + return E_ITS_INVALL_UNMAPPED_COLLECTION; + + vcpu = kvm_get_vcpu(kvm, collection->target_addr); + + its_update_lpis_configuration(kvm); + its_sync_lpi_pending_table(vcpu); + + return 0; +} + +/* The MOVALL command moves all IRQs from one redistributor to another. */ +static int vits_cmd_handle_movall(struct kvm *kvm, u64 *its_cmd) +{ + struct vgic_its *its = &kvm->arch.vgic.its; + u32 target1_addr = its_cmd_get_target_addr(its_cmd); + u32 target2_addr = its_cmd_mask_field(its_cmd, 3, 16, 32); + struct its_collection *collection; + struct its_device *device; + struct its_itte *itte; + + if (target1_addr >= atomic_read(&kvm->online_vcpus) || + target2_addr >= atomic_read(&kvm->online_vcpus)) + return E_ITS_MOVALL_PROCNUM_OOR; + + if (target1_addr == target2_addr) + return 0; + + spin_lock(&its->lock); + for_each_lpi(device, itte, kvm) { + /* remap all collections mapped to target address 1 */ + collection = itte->collection; + if (collection && collection->target_addr == target1_addr) + collection->target_addr = target2_addr; + + /* move pending state if LPI is affected */ + if (test_and_clear_bit(target1_addr, itte->pending)) + __set_bit(target2_addr, itte->pending); + } + + spin_unlock(&its->lock); + return 0; +} + static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) { - return -ENODEV; + u8 cmd = its_cmd_get_command(its_cmd); + int ret = -ENODEV; + + switch (cmd) { + case GITS_CMD_MAPD: + ret = vits_cmd_handle_mapd(vcpu->kvm, its_cmd); + break; + case GITS_CMD_MAPC: + ret = vits_cmd_handle_mapc(vcpu->kvm, its_cmd); + break; + case GITS_CMD_MAPI: + ret = vits_cmd_handle_mapi(vcpu->kvm, its_cmd, cmd); + break; + case GITS_CMD_MAPTI: + ret = vits_cmd_handle_mapi(vcpu->kvm, its_cmd, cmd); + break; + case GITS_CMD_MOVI: + ret = vits_cmd_handle_movi(vcpu->kvm, its_cmd); + break; + case GITS_CMD_DISCARD: + ret = vits_cmd_handle_discard(vcpu->kvm, its_cmd); + break; + case GITS_CMD_CLEAR: + ret = vits_cmd_handle_clear(vcpu->kvm, its_cmd); + break; + case GITS_CMD_MOVALL: + ret = vits_cmd_handle_movall(vcpu->kvm, its_cmd); + break; + case GITS_CMD_INV: + ret = vits_cmd_handle_inv(vcpu->kvm, its_cmd); + break; + case GITS_CMD_INVALL: + ret = vits_cmd_handle_invall(vcpu->kvm, its_cmd); + break; + case GITS_CMD_SYNC: + /* we ignore this command: we are in sync all of the time */ + ret = 0; + break; + } + + return ret; }
static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu, @@ -554,6 +1048,7 @@ void vits_destroy(struct kvm *kvm) list_for_each_safe(cur, temp, &dev->itt) { itte = (container_of(cur, struct its_itte, itte_list)); list_del(cur); + kfree(itte->pending); kfree(itte); } list_del(dev_cur); diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index cbc3877..830524a 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -39,4 +39,15 @@ void vits_destroy(struct kvm *kvm); bool vits_queue_lpis(struct kvm_vcpu *vcpu); void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
+#define E_ITS_MOVI_UNMAPPED_INTERRUPT 0x010107 +#define E_ITS_MOVI_UNMAPPED_COLLECTION 0x010109 +#define E_ITS_CLEAR_UNMAPPED_INTERRUPT 0x010507 +#define E_ITS_MAPC_PROCNUM_OOR 0x010902 +#define E_ITS_MAPTI_UNMAPPED_DEVICE 0x010a04 +#define E_ITS_MAPTI_PHYSICALID_OOR 0x010a06 +#define E_ITS_INV_UNMAPPED_INTERRUPT 0x010c07 +#define E_ITS_INVALL_UNMAPPED_COLLECTION 0x010d09 +#define E_ITS_MOVALL_PROCNUM_OOR 0x010e01 +#define E_ITS_DISCARD_UNMAPPED_INTERRUPT 0x010f07 + #endif
From: Andre Przywara andre.przywara@arm.com
When userland wants to inject a MSI into the guest, we have to use our data structures to find the LPI number and the VCPU to receive the interrupt. Use the wrapper functions to iterate the linked lists and find the proper Interrupt Translation Table Entry. Then set the pending bit in this ITTE to be later picked up by the LR handling code. Kick the VCPU which is meant to handle this interrupt. We provide a VGIC emulation model specific routine for the actual MSI injection. The wrapper functions return an error for models not (yet) implementing MSIs (like the GICv2 emulation). We also provide the handler for the ITS "INT" command, which allows a guest to trigger an MSI via the ITS command queue.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/kvm/arm_vgic.h | 1 + virt/kvm/arm/its-emul.c | 65 +++++++++++++++++++++++++++++++++++++++++++++ virt/kvm/arm/its-emul.h | 2 ++ virt/kvm/arm/vgic-v3-emul.c | 1 + 4 files changed, 69 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 323c33a..9e1abf9 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -149,6 +149,7 @@ struct vgic_vm_ops { int (*map_resources)(struct kvm *, const struct vgic_params *); bool (*queue_lpis)(struct kvm_vcpu *); void (*unqueue_lpi)(struct kvm_vcpu *, int irq); + int (*inject_msi)(struct kvm *, struct kvm_msi *); };
struct vgic_io_device { diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index 89534c6..a1c12bb 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -323,6 +323,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu, }
/* + * Translates an incoming MSI request into the redistributor (=VCPU) and + * the associated LPI number. Sets the LPI pending bit and also marks the + * VCPU as having a pending interrupt. + */ +int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi) +{ + struct vgic_dist *dist = &kvm->arch.vgic; + struct vgic_its *its = &dist->its; + struct its_itte *itte; + int cpuid; + bool inject = false; + int ret = 0; + + if (!vgic_has_its(kvm)) + return -ENODEV; + + if (!(msi->flags & KVM_MSI_VALID_DEVID)) + return -EINVAL; + + spin_lock(&its->lock); + + if (!its->enabled || !dist->lpis_enabled) { + ret = -EAGAIN; + goto out_unlock; + } + + itte = find_itte(kvm, msi->devid, msi->data); + /* Triggering an unmapped IRQ gets silently dropped. */ + if (!itte || !itte->collection) + goto out_unlock; + + cpuid = itte->collection->target_addr; + __set_bit(cpuid, itte->pending); + inject = itte->enabled; + +out_unlock: + spin_unlock(&its->lock); + + if (inject) { + spin_lock(&dist->lock); + __set_bit(cpuid, dist->irq_pending_on_cpu); + spin_unlock(&dist->lock); + kvm_vcpu_kick(kvm_get_vcpu(kvm, cpuid)); + } + + return ret; +} + +/* * Find all enabled and pending LPIs and queue them into the list * registers. * The dist lock is held by the caller. @@ -787,6 +836,19 @@ static int vits_cmd_handle_movall(struct kvm *kvm, u64 *its_cmd) return 0; }
+/* The INT command injects the LPI associated with that DevID/EvID pair. */ +static int vits_cmd_handle_int(struct kvm *kvm, u64 *its_cmd) +{ + struct kvm_msi msi = { + .data = its_cmd_get_id(its_cmd), + .devid = its_cmd_get_deviceid(its_cmd), + .flags = KVM_MSI_VALID_DEVID, + }; + + vits_inject_msi(kvm, &msi); + return 0; +} + static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) { u8 cmd = its_cmd_get_command(its_cmd); @@ -817,6 +879,9 @@ static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd) case GITS_CMD_MOVALL: ret = vits_cmd_handle_movall(vcpu->kvm, its_cmd); break; + case GITS_CMD_INT: + ret = vits_cmd_handle_int(vcpu->kvm, its_cmd); + break; case GITS_CMD_INV: ret = vits_cmd_handle_inv(vcpu->kvm, its_cmd); break; diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h index 830524a..95e56a7 100644 --- a/virt/kvm/arm/its-emul.h +++ b/virt/kvm/arm/its-emul.h @@ -36,6 +36,8 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu); int vits_init(struct kvm *kvm); void vits_destroy(struct kvm *kvm);
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi); + bool vits_queue_lpis(struct kvm_vcpu *vcpu); void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 4132c26..30bf703 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -948,6 +948,7 @@ void vgic_v3_init_emulation(struct kvm *kvm) dist->vm_ops.init_model = vgic_v3_init_model; dist->vm_ops.destroy_model = vgic_v3_destroy_model; dist->vm_ops.map_resources = vgic_v3_map_resources; + dist->vm_ops.inject_msi = vits_inject_msi; dist->vm_ops.queue_lpis = vits_queue_lpis; dist->vm_ops.unqueue_lpi = vits_unqueue_lpi;
From: Andre Przywara andre.przywara@arm.com
If userspace has provided a base address for the ITS register frame, we enable the bits that advertise LPIs in the GICv3. When the guest has enabled LPIs and the ITS, we enable the emulation part by initializing the ITS data structures and trapping on ITS register frame accesses by the guest. Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject MSIs into the guest. Not having enabled the ITS emulation will lead to a -ENODEV when trying to inject a MSI.
Signed-off-by: Andre Przywara andre.przywara@arm.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/api.txt | 2 +- arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/reset.c | 6 ++++++ include/kvm/arm_vgic.h | 6 ++++++ virt/kvm/arm/its-emul.c | 10 +++++++++- virt/kvm/arm/vgic-v3-emul.c | 20 ++++++++++++++------ virt/kvm/arm/vgic.c | 8 ++++++++ 7 files changed, 45 insertions(+), 8 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index cb04095..1b53155 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2134,7 +2134,7 @@ after pausing the vcpu, but before it is resumed. 4.71 KVM_SIGNAL_MSI
Capability: KVM_CAP_SIGNAL_MSI -Architectures: x86 +Architectures: x86 arm64 Type: vm ioctl Parameters: struct kvm_msi (in) Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index bfffe8f..ff9722f 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI ---help--- Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 866502b..aff209e 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -64,6 +64,12 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_EL1_32BIT: r = cpu_has_32bit_el1(); break; + case KVM_CAP_MSI_DEVID: + if (!kvm) + r = -EINVAL; + else + r = kvm->arch.vgic.msis_require_devid; + break; default: r = 0; } diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 9e1abf9..f50081c 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -162,6 +162,7 @@ struct vgic_io_device {
struct vgic_its { bool enabled; + struct vgic_io_device iodev; spinlock_t lock; u64 cbaser; int creadr; @@ -180,6 +181,9 @@ struct vgic_dist { /* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */ u32 vgic_model;
+ /* Do injected MSIs require an additional device ID? */ + bool msis_require_devid; + int nr_cpus; int nr_irqs;
@@ -371,4 +375,6 @@ static inline int vgic_v3_probe(struct device_node *vgic_node, } #endif
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi); + #endif diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c index a1c12bb..b6caefd 100644 --- a/virt/kvm/arm/its-emul.c +++ b/virt/kvm/arm/its-emul.c @@ -1073,6 +1073,7 @@ int vits_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct vgic_its *its = &dist->its; + int ret;
dist->pendbaser = kmalloc(sizeof(u64) * dist->nr_cpus, GFP_KERNEL); if (!dist->pendbaser) @@ -1087,9 +1088,16 @@ int vits_init(struct kvm *kvm) INIT_LIST_HEAD(&its->device_list); INIT_LIST_HEAD(&its->collection_list);
+ ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base, + KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges, + -1, &its->iodev); + if (ret) + return ret; + its->enabled = false; + dist->msis_require_devid = true;
- return -ENXIO; + return 0; }
void vits_destroy(struct kvm *kvm) diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c index 30bf703..9fd1238 100644 --- a/virt/kvm/arm/vgic-v3-emul.c +++ b/virt/kvm/arm/vgic-v3-emul.c @@ -8,7 +8,6 @@ * * Limitations of the emulation: * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore) - * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI. * - We do not support the message based interrupts (MBIs) triggered by * writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0. * - We do not support the (optional) backwards compatibility feature. @@ -87,10 +86,10 @@ static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu, /* * As this implementation does not provide compatibility * with GICv2 (ARE==1), we report zero CPUs in bits [5..7]. - * Also LPIs and MBIs are not supported, so we set the respective bits to 0. - * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs). + * Also we report at most 2**10=1024 interrupt IDs (to match 1024 SPIs) + * and provide 16 bits worth of LPI number space (to give 8192 LPIs). */ -#define INTERRUPT_ID_BITS 10 +#define INTERRUPT_ID_BITS_SPIS 10 static bool handle_mmio_typer(struct kvm_vcpu *vcpu, struct kvm_exit_mmio *mmio, phys_addr_t offset) { @@ -98,7 +97,12 @@ static bool handle_mmio_typer(struct kvm_vcpu *vcpu,
reg = (min(vcpu->kvm->arch.vgic.nr_irqs, 1024) >> 5) - 1;
- reg |= (INTERRUPT_ID_BITS - 1) << 19; + if (vgic_has_its(vcpu->kvm)) { + reg |= GICD_TYPER_LPIS; + reg |= (INTERRUPT_ID_BITS_ITS - 1) << 19; + } else { + reg |= (INTERRUPT_ID_BITS_SPIS - 1) << 19; + }
vgic_reg_access(mmio, ®, offset, ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); @@ -543,7 +547,9 @@ static bool handle_mmio_ctlr_redist(struct kvm_vcpu *vcpu, vgic_reg_access(mmio, ®, offset, ACCESS_READ_VALUE | ACCESS_WRITE_VALUE); if (!dist->lpis_enabled && (reg & GICR_CTLR_ENABLE_LPIS)) { - /* Eventually do something */ + vgic_enable_lpis(vcpu); + dist->lpis_enabled = true; + return true; } return false; } @@ -570,6 +576,8 @@ static bool handle_mmio_typer_redist(struct kvm_vcpu *vcpu, reg = redist_vcpu->vcpu_id << 8; if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1) reg |= GICR_TYPER_LAST; + if (vgic_has_its(vcpu->kvm)) + reg |= GICR_TYPER_PLPIS; vgic_reg_access(mmio, ®, offset, ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED); return false; diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 9dfd094..081a1ef 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2246,3 +2246,11 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, { return 0; } + +int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) +{ + if (kvm->arch.vgic.vm_ops.inject_msi) + return kvm->arch.vgic.vm_ops.inject_msi(kvm, msi); + else + return -ENODEV; +}
From: Eric Auger eric.auger@linaro.org
On ARM, the MSI msg (address and data) comes along with out-of-band device ID information. The device ID encodes the device that writes the MSI msg. Let's convey the device id in kvm_irq_routing_msi and use a new routing entry type to indicate the devid is populated.
Signed-off-by: Eric Auger eric.auger@linaro.org
---
v1 -> v2: - devid id passed in kvm_irq_routing_msi instead of in kvm_irq_routing_entry
RFC -> PATCH - remove kvm_irq_routing_extended_msi and use union instead
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/api.txt | 10 +++++++++- include/uapi/linux/kvm.h | 6 +++++- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 1b53155..3094139 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1453,6 +1453,7 @@ struct kvm_irq_routing_entry { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
No flags are specified so far, the corresponding field must be set to zero.
@@ -1465,9 +1466,16 @@ struct kvm_irq_routing_msi { __u32 address_lo; __u32 address_hi; __u32 data; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; };
+for KVM_IRQ_ROUTING_EXTENDED_MSI routing entry type, the kvm_irq_routing_msi +routing entry is used and devid is populated with the device ID that wrote +the MSI message. For PCI, this is usually a BFD identifier in the lower 16 bits. + struct kvm_irq_routing_s390_adapter { __u64 ind_addr; __u64 summary_addr; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1c48def..817586f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -830,7 +830,10 @@ struct kvm_irq_routing_msi { __u32 address_lo; __u32 address_hi; __u32 data; - __u32 pad; + union { + __u32 pad; + __u32 devid; + }; };
struct kvm_irq_routing_s390_adapter { @@ -845,6 +848,7 @@ struct kvm_irq_routing_s390_adapter { #define KVM_IRQ_ROUTING_IRQCHIP 1 #define KVM_IRQ_ROUTING_MSI 2 #define KVM_IRQ_ROUTING_S390_ADAPTER 3 +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
struct kvm_irq_routing_entry { __u32 gsi;
From: Eric Auger eric.auger@linaro.org
Extend kvm_kernel_irq_routing_entry to transport devid. This is needed for ARM. Its validity depends on the routing type entry.
Signed-off-by: Eric Auger eric.auger@linaro.org
---
v1 -> v2: replace msi_msg field by a struct composed of msi_msg and devid
RFC -> PATCH: - reword the commit message after change in first patch (uapi)
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- include/linux/kvm_host.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 05e99b8..d2f7c86 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -321,7 +321,10 @@ struct kvm_kernel_irq_routing_entry { unsigned irqchip; unsigned pin; } irqchip; - struct msi_msg msi; + struct { + struct msi_msg msi; + u32 devid; + }; struct kvm_s390_adapter_int adapter; }; struct hlist_node link;
From: Eric Auger eric.auger@linaro.org
on ARM, a devid field is populated in kvm_msi struct in case the flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding kvm_kernel_irq_routing_entry devid field and set the msi type to KVM_IRQ_ROUTING_EXTENDED_MSI.
Signed-off-by: Eric Auger eric.auger@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- virt/kvm/irqchip.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index 21c1424..e678f8a 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -72,9 +72,17 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) { struct kvm_kernel_irq_routing_entry route;
- if (!irqchip_in_kernel(kvm) || msi->flags != 0) + if (!irqchip_in_kernel(kvm)) return -EINVAL;
+ if (msi->flags & KVM_MSI_VALID_DEVID) { + route.devid = msi->devid; + route.type = KVM_IRQ_ROUTING_EXTENDED_MSI; + } else if (!msi->flags) + return -EINVAL; + + /* historically the route.type was not set */ + route.msi.address_lo = msi->address_lo; route.msi.address_hi = msi->address_hi; route.msi.data = msi->data;
From: Eric Auger eric.auger@linaro.org
This patch adds compilation and link against irqchip.
On ARM, irqchip routing is not really useful since there is a single irqchip. However main motivation behind using irqchip code is to enable MSI routing code. With the support of in-kernel GICv3 ITS emulation, it now seems to be a MUST HAVE requirement.
Functions previously implemented in vgic.c and substitute to more complex irqchip implementation are removed:
- kvm_send_userspace_msi - kvm_irq_map_chip_pin - kvm_set_irq - kvm_irq_map_gsi.
They implemented a kernel default identity GSI routing. This is now replaced by user-side provided routing.
Routing standard hooks are now implemented in vgic: - kvm_set_routing_entry - kvm_set_irq - kvm_set_msi
Both HAVE_KVM_IRQCHIP and HAVE_KVM_IRQ_ROUTING are defined. KVM_CAP_IRQ_ROUTING is advertised and KVM_SET_GSI_ROUTING is allowed.
MSI routing is not yet allowed.
Signed-off-by: Eric Auger eric.auger@linaro.org
--- v1 -> v2: - fix bug reported by Andre related to msi.flags and msi.devid setting in kvm_send_userspace_msi - avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq
RFC -> PATCH - reword api.txt: x move MSI routing comments in a subsequent patch, x clearly state GSI routing does not apply to KVM_IRQ_LINE
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/api.txt | 12 ++++--- arch/arm/include/asm/kvm_host.h | 2 ++ arch/arm/kvm/Kconfig | 2 ++ arch/arm/kvm/Makefile | 2 +- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/Kconfig | 2 ++ arch/arm64/kvm/Makefile | 2 +- include/kvm/arm_vgic.h | 2 -- virt/kvm/arm/vgic.c | 69 ++++++++++++++++++++++++++------------- virt/kvm/irqchip.c | 2 ++ 10 files changed, 65 insertions(+), 31 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 3094139..459da31 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1421,13 +1421,16 @@ KVM_ASSIGN_DEV_IRQ. Partial deassignment of host or guest IRQ is allowed. 4.52 KVM_SET_GSI_ROUTING
Capability: KVM_CAP_IRQ_ROUTING -Architectures: x86 s390 +Architectures: x86 s390 arm arm64 Type: vm ioctl Parameters: struct kvm_irq_routing (in) Returns: 0 on success, -1 on error
Sets the GSI routing table entries, overwriting any previously set entries.
+On arm/arm64, GSI routing has the following limitation: +- GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. + struct kvm_irq_routing { __u32 nr; __u32 flags; @@ -2339,9 +2342,10 @@ Note that closing the resamplefd is not sufficient to disable the irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment and need not be specified with KVM_IRQFD_FLAG_DEASSIGN.
-On ARM/ARM64, the gsi field in the kvm_irqfd struct specifies the Shared -Peripheral Interrupt (SPI) index, such that the GIC interrupt ID is -given by gsi + 32. +On arm/arm64, gsi routing being supported, the following can happen: +- in case no routing entry is associated to this gsi, injection fails +- in case the gsi is associated to an irqchip routing entry, + irqchip.pin + 32 corresponds to the injected SPI ID.
4.76 KVM_PPC_ALLOCATE_HTAB
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 56cac05..9ced147 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -42,6 +42,8 @@
#define KVM_VCPU_MAX_FEATURES 2
+#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */ + #include <kvm/arm_vgic.h>
u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index bfb915d..151e710 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,8 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER ---help--- Support hosting virtualized guest machines. diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index c5eef02c..1a8f48a 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt) AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
KVM := ../../../virt/kvm -kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o +kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o $(KVM)/irqchip.o
obj-y += kvm-arm.o init.o interrupts.o obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 8d78a72..ff007cf 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -44,6 +44,7 @@ #include <kvm/arm_arch_timer.h>
#define KVM_VCPU_MAX_FEATURES 3 +#define KVM_IRQCHIP_NUM_PINS 988 /* 1020 -32 is the number of SPI */
int __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ff9722f..1a9900d 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -32,6 +32,8 @@ config KVM select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD select HAVE_KVM_MSI + select HAVE_KVM_IRQCHIP + select HAVE_KVM_IRQ_ROUTING ---help--- Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 9803307..90a08457 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o $(KVM)/irqchip.o kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index f50081c..1b370a0 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -375,6 +375,4 @@ static inline int vgic_v3_probe(struct device_node *vgic_node, } #endif
-int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi); - #endif diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 081a1ef..e38a0de 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2215,42 +2215,65 @@ out_free_irq: return ret; }
-int kvm_irq_map_gsi(struct kvm *kvm, - struct kvm_kernel_irq_routing_entry *entries, - int gsi) +int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e, + struct kvm *kvm, int irq_source_id, + int level, bool line_status) { - return 0; -} - -int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin) -{ - return pin; -} - -int kvm_set_irq(struct kvm *kvm, int irq_source_id, - u32 irq, int level, bool line_status) -{ - unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS; + unsigned int spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
- trace_kvm_set_irq(irq, level, irq_source_id); + trace_kvm_set_irq(spi_id, level, irq_source_id);
BUG_ON(!vgic_initialized(kvm));
- return kvm_vgic_inject_irq(kvm, 0, spi, level); + if (spi_id > min(kvm->arch.vgic.nr_irqs, 1020)) + return -EINVAL; + return kvm_vgic_inject_irq(kvm, 0, spi_id, level); +} + +/** + * Populates a kvm routing entry from a user routing entry + * @e: kvm internal formatted entry + * @ue: user api formatted entry + * return 0 on success, -EINVAL on errors. + */ +int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e, + const struct kvm_irq_routing_entry *ue) +{ + int r = -EINVAL; + + switch (ue->type) { + case KVM_IRQ_ROUTING_IRQCHIP: + e->set = vgic_irqfd_set_irq; + e->irqchip.irqchip = ue->u.irqchip.irqchip; + e->irqchip.pin = ue->u.irqchip.pin; + if ((e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS) || + (e->irqchip.irqchip >= KVM_NR_IRQCHIPS)) + goto out; + break; + default: + goto out; + } + r = 0; +out: + return r; }
-/* MSI not implemented yet */ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level, bool line_status) { - return 0; -} + struct kvm_msi msi; + + msi.address_lo = e->msi.address_lo; + msi.address_hi = e->msi.address_hi; + msi.data = e->msi.data; + if (e->type == KVM_IRQ_ROUTING_EXTENDED_MSI) { + msi.devid = e->devid; + msi.flags = KVM_MSI_VALID_DEVID; + }
-int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi) -{ if (kvm->arch.vgic.vm_ops.inject_msi) - return kvm->arch.vgic.vm_ops.inject_msi(kvm, msi); + return kvm->arch.vgic.vm_ops.inject_msi(kvm, &msi); else return -ENODEV; } diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c index e678f8a..f26cadd 100644 --- a/virt/kvm/irqchip.c +++ b/virt/kvm/irqchip.c @@ -29,7 +29,9 @@ #include <linux/srcu.h> #include <linux/export.h> #include <trace/events/kvm.h> +#if !defined(CONFIG_ARM) && !defined(CONFIG_ARM64) #include "irq.h" +#endif
struct kvm_irq_routing_table { int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
From: Eric Auger eric.auger@linaro.org
Implement a default routing table made of flat irqchip routing entries (gsi = irqchip.pin) covering the VGIC SPI indexes. This routing table is overwritten by the first user-space call to KVM_SET_GSI_ROUTING ioctl.
Signed-off-by: Eric Auger eric.auger@linaro.org
---
PATCH: creation Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- virt/kvm/arm/vgic.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index e38a0de..12391a9 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1785,6 +1785,8 @@ int vgic_init(struct kvm *kvm) ret |= vgic_init_bitmap(&dist->irq_cfg, nr_cpus, nr_irqs); ret |= vgic_init_bytemap(&dist->irq_priority, nr_cpus, nr_irqs);
+ ret |= kvm_setup_default_irq_routing(kvm); + if (ret) goto out;
@@ -2258,6 +2260,25 @@ out: return r; }
+int kvm_setup_default_irq_routing(struct kvm *kvm) +{ + struct kvm_irq_routing_entry *entries; + u32 nr = kvm->arch.vgic.nr_irqs - VGIC_NR_PRIVATE_IRQS; + int i, ret; + + entries = kcalloc(nr, sizeof(struct kvm_kernel_irq_routing_entry), + GFP_KERNEL); + for (i = 0; i < nr; i++) { + entries[i].gsi = i; + entries[i].type = KVM_IRQ_ROUTING_IRQCHIP; + entries[i].u.irqchip.irqchip = 0; + entries[i].u.irqchip.pin = i; + } + ret = kvm_set_irq_routing(kvm, entries, nr, 0); + kfree(entries); + return ret; +} + int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm, int irq_source_id, int level, bool line_status)
From: Eric Auger eric.auger@linaro.org
Up to now, only irqchip routing entries could be set. This patch adds the capability to insert MSI routing entries, with or without device id. Although standard MSI entries can be set, their injection still is not supported. For ARM64, let's also increase KVM_MAX_IRQ_ROUTES to 4096: include SPI irqchip flat routes plus MSI routes. In the future this might be extended.
The new MSI routing entry type also must be managed similarly to legacy KVM_IRQ_ROUTING_MSI in eventfd irqfd_wakeup and irqfd_update.
Signed-off-by: Eric Auger eric.auger@linaro.org
---
v1 -> v2: - adapt to new routing entry types
RFC -> PATCH: - move api MSI routing updates into that patch file - use new devid field of user api struct
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- Documentation/virtual/kvm/api.txt | 10 ++++++++++ include/linux/kvm_host.h | 2 ++ virt/kvm/arm/vgic.c | 13 +++++++++++++ virt/kvm/eventfd.c | 6 ++++-- 4 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 459da31..8a772e0 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1431,6 +1431,11 @@ Sets the GSI routing table entries, overwriting any previously set entries. On arm/arm64, GSI routing has the following limitation: - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
+On arm/arm64, MSI routing through in-kernel GICv3 ITS must use +KVM_IRQ_ROUTING_EXTENDED_MSI routing type and device ID must be set +in msi struct. Otherwise, KVM_IRQ_ROUTING_MSI must be used without +populating the msi devid field. + struct kvm_irq_routing { __u32 nr; __u32 flags; @@ -2346,6 +2351,11 @@ On arm/arm64, gsi routing being supported, the following can happen: - in case no routing entry is associated to this gsi, injection fails - in case the gsi is associated to an irqchip routing entry, irqchip.pin + 32 corresponds to the injected SPI ID. +- in case the gsi is associated to an MSI routing entry, + * without GICv3 ITS in-kernel emulation, MSI data matches the SPI ID + of the injected SPI + * with GICv3 ITS in-kernel emulation, the MSI message and device ID + are translated into an LPI.
4.76 KVM_PPC_ALLOCATE_HTAB
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d2f7c86..f580f4b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -994,6 +994,8 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq)
#ifdef CONFIG_S390 #define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that... +#elif defined(CONFIG_ARM64) +#define KVM_MAX_IRQ_ROUTES 4096 #else #define KVM_MAX_IRQ_ROUTES 1024 #endif diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 12391a9..ebf5073 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -2252,6 +2252,19 @@ int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e, (e->irqchip.irqchip >= KVM_NR_IRQCHIPS)) goto out; break; + case KVM_IRQ_ROUTING_MSI: + e->set = kvm_set_msi; + e->msi.address_lo = ue->u.msi.address_lo; + e->msi.address_hi = ue->u.msi.address_hi; + e->msi.data = ue->u.msi.data; + break; + case KVM_IRQ_ROUTING_EXTENDED_MSI: + e->set = kvm_set_msi; + e->msi.address_lo = ue->u.msi.address_lo; + e->msi.address_hi = ue->u.msi.address_hi; + e->msi.data = ue->u.msi.data; + e->devid = ue->u.msi.devid; + break; default: goto out; } diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 9ff4193..d76d05d 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -238,7 +238,8 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) irq = irqfd->irq_entry; } while (read_seqcount_retry(&irqfd->irq_entry_sc, seq)); /* An event has been signaled, inject an interrupt */ - if (irq.type == KVM_IRQ_ROUTING_MSI) + if (irq.type == KVM_IRQ_ROUTING_MSI || + irq.type == KVM_IRQ_ROUTING_EXTENDED_MSI) kvm_set_msi(&irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1, false); else @@ -294,7 +295,8 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd) e = entries; for (i = 0; i < n_entries; ++i, ++e) { /* Only fast-path MSI. */ - if (e->type == KVM_IRQ_ROUTING_MSI) + if (e->type == KVM_IRQ_ROUTING_MSI || + e->type == KVM_IRQ_ROUTING_EXTENDED_MSI) irqfd->irq_entry = *e; }
From: Eric Auger eric.auger@linaro.org
If the ITS modality is not available, let's simply support MSI injection by transforming the MSI.data into an SPI ID.
This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too.
Signed-off-by: Eric Auger eric.auger@linaro.org
---
v1 -> v2: - introduce vgic_v2m_inject_msi in vgic-v2-emul.c following Andre's advice
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm/kvm/Kconfig | 1 + virt/kvm/arm/vgic-v2-emul.c | 12 ++++++++++++ 2 files changed, 13 insertions(+)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 151e710..0f58baf 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -31,6 +31,7 @@ config KVM select KVM_VFIO select HAVE_KVM_EVENTFD select HAVE_KVM_IRQFD + select HAVE_KVM_MSI select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQ_ROUTING depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c index 8faa28c..be0fe49 100644 --- a/virt/kvm/arm/vgic-v2-emul.c +++ b/virt/kvm/arm/vgic-v2-emul.c @@ -478,6 +478,17 @@ static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int irq) }
/** + * Emulates GICv2M MSI injection by injecting the SPI ID matching + * the msi data + * @kvm: pointer to the kvm struct + * @msi: the msi struct handle + */ +static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi) +{ + return kvm_vgic_inject_irq(kvm, 0, msi->data, 1); +} + +/** * kvm_vgic_map_resources - Configure global VGIC state before running any VCPUs * @kvm: pointer to the kvm struct * @@ -566,6 +577,7 @@ void vgic_v2_init_emulation(struct kvm *kvm) dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source; dist->vm_ops.init_model = vgic_v2_init_model; dist->vm_ops.map_resources = vgic_v2_map_resources; + dist->vm_ops.inject_msi = vgic_v2m_inject_msi;
dist->vgic_cpu_base = VGIC_ADDR_UNDEF; dist->vgic_dist_base = VGIC_ADDR_UNDEF;
From: Andrew Pinski apinski@cavium.com
On some cores, udiv with a large value is slow, expand instead the division out to be what GCC would have generated for the divide by 1000.
Signed-off-by: Andrew Pinski apinski@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/vdso/gettimeofday.S | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index efa79e8..e5caef9 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -64,10 +64,22 @@ ENTRY(__kernel_gettimeofday) bl __do_get_tspec seqcnt_check w9, 1b
- /* Convert ns to us. */ - mov x13, #1000 - lsl x13, x13, x12 - udiv x11, x11, x13 + /* Undo the shift. */ + lsr x11, x11, x12 + + /* Convert ns to us (division by 1000 by using multiply high). + * This is how GCC converts the division by 1000 into. + * This is faster than divide on most cores. + */ + mov x13, 63439 + movk x13, 0xe353, lsl 16 + lsr x11, x11, 3 + movk x13, 0x9ba5, lsl 32 + movk x13, 0x20c4, lsl 48 + /* x13 = 0x20c49ba5e353f7cf */ + umulh x11, x11, x13 + lsr x11, x11, 4 + stp x10, x11, [x0, #TVAL_TV_SEC] 2: /* If tz is NULL, return 0. */
From: Andrew Pinski apinski@cavium.com
In most other targets (x86/tile for an example), the division in __do_get_tspec is converted into a simple loop. The main reason for this is because the result of this division is going to be either 0 or 1. This changes the division to the simple loop and thus speeding up gettimeofday.
Signed-off-by: Andrew Pinski apinski@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/vdso/gettimeofday.S | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index e5caef9..cd297a7 100644 --- a/arch/arm64/kernel/vdso/gettimeofday.S +++ b/arch/arm64/kernel/vdso/gettimeofday.S @@ -246,14 +246,25 @@ ENTRY(__do_get_tspec) mul x10, x10, x11
/* Use the kernel time to calculate the new timespec. */ - mov x11, #NSEC_PER_SEC_LO16 - movk x11, #NSEC_PER_SEC_HI16, lsl #16 - lsl x11, x11, x12 - add x15, x10, x14 - udiv x14, x15, x11 - add x10, x13, x14 - mul x13, x14, x11 - sub x11, x15, x13 + mov x15, #NSEC_PER_SEC_LO16 + movk x15, #NSEC_PER_SEC_HI16, lsl #16 + lsl x15, x15, x12 + add x11, x10, x14 + mov x10, x13 + + /* + * Use a loop instead of a division as this is most + * likely going to be only giving a 1 or 0 and that is faster + * than a division. + */ + cmp x11, x15 + b.lt 1f +2: + sub x11, x11, x15 + add x10, x10, 1 + cmp x11, x15 + b.ge 2b +1:
ret .cfi_endproc
From: Andrew Pinski apinski@cavium.com
For high core counts, we want to add a delay when current serving tick is "far" away from our ticket.
Signed-off-by: Andrew Pinski apinski@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/spinlock.h | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index cee1287..d867547 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -51,10 +51,31 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) * unlock before the exclusive load. */ " sevl\n" -"2: wfe\n" + /* Delay if our ticket is not the next ticket. */ +" uxth %w2, %w0\n" +" lsr %w0, %w0, 16\n" + /* %w2 is the difference between our ticket and the current ticket. */ +"2: sub %w2, %w0, %w2\n" + /* If the tickets have wrapped, then we need to add USHORT_MAX. */ +" cmp %w2, wzr\n" +" b.lt 5f\n" +"6: sub %w2, %w2, 1\n" +" cbz %w2, 7f\n" + /* Multiply by 64, a good estimate of how long an lock/unlock will take. */ +" lsl %w2, %w2, 6\n" + /* Spin until we get 0. */ +"4: sub %w2, %w2, 1\n" +" cbnz %w2, 4b\n" + /* Wait for event, we might not be the current ticket. */ +"7: wfe\n" " ldaxrh %w2, %4\n" -" eor %w1, %w2, %w0, lsr #16\n" +" eor %w1, %w2, %w0\n" " cbnz %w1, 2b\n" +" b 3f\n" + /* Wrap case, add USHORT_MAX to wrap around again. */ +"5: mov %w1, 0xffff\n" +" add %w2, %w2, %w1\n" +" b 7b\n" /* We got the lock. Critical section starts here. */ "3:" : "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock)
From: Andrew Pinski apinski@cavium.com
In the previous patch, I had made a mistake of putting WFE after the delay which meant if we enable the WFE, we would get the same bad performance as before. Also use the flags register some more to allow the instructions to be fused together.
Signed-off-by: Andrew Pinski apinski@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/spinlock.h | 41 +++++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index d867547..ad629ee 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -46,31 +46,38 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) /* Did we get the lock? */ " eor %w1, %w0, %w0, ror #16\n" " cbz %w1, 3f\n" + /* Put the current ticket into %w2 */ +" uxth %w2, %w0\n" + /* Put the our ticket into %w0 */ +" lsr %w0, %w0, 16\n" /* * No: spin on the owner. Send a local event to avoid missing an * unlock before the exclusive load. */ " sevl\n" + /* Wait for event, we might not be the current ticket. */ +"2: wfe\n" /* Delay if our ticket is not the next ticket. */ -" uxth %w2, %w0\n" -" lsr %w0, %w0, 16\n" /* %w2 is the difference between our ticket and the current ticket. */ -"2: sub %w2, %w0, %w2\n" +"2: subs %w2, %w0, %w2\n" /* If the tickets have wrapped, then we need to add USHORT_MAX. */ -" cmp %w2, wzr\n" -" b.lt 5f\n" -"6: sub %w2, %w2, 1\n" -" cbz %w2, 7f\n" - /* Multiply by 64, a good estimate of how long an lock/unlock will take. */ -" lsl %w2, %w2, 6\n" +" b.mi 5f\n" + /* Subtract one from the difference. */ +"6: subs %w2, %w2, 1\n" + /* Don't wait if we the next ticket. */ +" b.eq 7f\n" + /* Multiply by 80, a good estimate of how long an lock/unlock will take. */ +" lsl %w2, %w2, #4\n" +" add %w2, %w2, %w2, lsl #2\n" /* Spin until we get 0. */ -"4: sub %w2, %w2, 1\n" -" cbnz %w2, 4b\n" - /* Wait for event, we might not be the current ticket. */ -"7: wfe\n" -" ldaxrh %w2, %4\n" -" eor %w1, %w2, %w0\n" -" cbnz %w1, 2b\n" +"4: subs %w2, %w2, #1\n" +" b.ne 4b\n" + + /* Get the current ticket. */ +"7: ldaxrh %w2, %4\n" + /* See if we get the ticket, otherwise loop. */ +" cmp %w2, %w0\n" +" b.ne 2b\n" " b 3f\n" /* Wrap case, add USHORT_MAX to wrap around again. */ "5: mov %w1, 0xffff\n" @@ -80,7 +87,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) "3:" : "=&r" (lockval), "=&r" (newval), "=&r" (tmp), "+Q" (*lock) : "Q" (lock->owner), "I" (1 << TICKET_SHIFT) - : "memory"); + : "memory", "cc"); }
static inline int arch_spin_trylock(arch_spinlock_t *lock)
From: Andrew Pinski apinski@cavium.com
Adding a check for the cache line size is not much overhead. Special case 128 byte cache line size. This improves copy_page by 85% on ThunderX compared to the original implementation.
For LMBench, it improves between 4-10%.
Signed-off-by: Andrew Pinski apinski@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/lib/copy_page.S | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S index 512b9a7..24c72f2 100644 --- a/arch/arm64/lib/copy_page.S +++ b/arch/arm64/lib/copy_page.S @@ -27,6 +27,12 @@ * x1 - src */ ENTRY(copy_page) + /* Special case 128 byte or more cache lines */ + mrs x2, dczid_el0 + and w2, w2, #0xf + cmp w2, 5 + b.ge 2f + /* Assume cache line size is 64 bytes. */ prfm pldl1strm, [x1, #64] 1: ldp x2, x3, [x1] @@ -40,6 +46,32 @@ ENTRY(copy_page) stnp x6, x7, [x0, #32] stnp x8, x9, [x0, #48] add x0, x0, #64 + tst x1, #(PAGE_SIZE - 1) + b.ne 1b + ret +2: + /* The cache line size is at least 128 bytes. */ + prfm pldl1strm, [x1, #128] +1: prfm pldl1strm, [x1, #256] + ldp x2, x3, [x1] + ldp x4, x5, [x1, #16] + ldp x6, x7, [x1, #32] + ldp x8, x9, [x1, #48] + stnp x2, x3, [x0] + stnp x4, x5, [x0, #16] + stnp x6, x7, [x0, #32] + stnp x8, x9, [x0, #48] + + ldp x2, x3, [x1, #64] + ldp x4, x5, [x1, #80] + ldp x6, x7, [x1, #96] + ldp x8, x9, [x1, #112] + add x1, x1, #128 + stnp x2, x3, [x0, #64] + stnp x4, x5, [x0, #80] + stnp x6, x7, [x0, #96] + stnp x8, x9, [x0, #112] + add x0, x0, #128 tst x1, #(PAGE_SIZE - 1) b.ne 1b ret
From: Feng Kan fkan@apm.com
Using the glibc cortex string work work authored by Linaro as base to create new copy to/from user kernel routine.
Iperf performance increase: -l (size) 1 core result Optimized 64B 44-51Mb/s 1500B 4.9Gb/s 30000B 16.2Gb/s Original 64B 34-50.7Mb/s 1500B 4.7Gb/s 30000B 14.5Gb/s
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1400349
Signed-off-by: Feng Kan fkan@apm.com Signed-off-by: Craig Magina craig.magina@canonical.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/lib/copy_from_user.S | 87 +++++++++++------ arch/arm64/lib/copy_template.S | 212 ++++++++++++++++++++++++++++++++++++++++ arch/arm64/lib/copy_to_user.S | 57 ++++++----- 3 files changed, 297 insertions(+), 59 deletions(-) create mode 100644 arch/arm64/lib/copy_template.S
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S index 5e27add..ca96971 100644 --- a/arch/arm64/lib/copy_from_user.S +++ b/arch/arm64/lib/copy_from_user.S @@ -15,7 +15,6 @@ */
#include <linux/linkage.h> -#include <asm/assembler.h>
/* * Copy from user space to a kernel buffer (alignment handled by the hardware) @@ -28,39 +27,63 @@ * x0 - bytes not copied */ ENTRY(__copy_from_user) - add x4, x1, x2 // upper user buffer boundary - subs x2, x2, #8 - b.mi 2f -1: -USER(9f, ldr x3, [x1], #8 ) - subs x2, x2, #8 - str x3, [x0], #8 - b.pl 1b -2: adds x2, x2, #4 - b.mi 3f -USER(9f, ldr w3, [x1], #4 ) - sub x2, x2, #4 - str w3, [x0], #4 -3: adds x2, x2, #2 - b.mi 4f -USER(9f, ldrh w3, [x1], #2 ) - sub x2, x2, #2 - strh w3, [x0], #2 -4: adds x2, x2, #1 - b.mi 5f -USER(9f, ldrb w3, [x1] ) - strb w3, [x0] -5: mov x0, #0 - ret +#include "copy_template.S" ENDPROC(__copy_from_user)
.section .fixup,"ax" - .align 2 -9: sub x2, x4, x1 - mov x3, x2 -10: strb wzr, [x0], #1 // zero remaining buffer space - subs x3, x3, #1 - b.ne 10b - mov x0, x2 // bytes not copied + .align 2 +8: + /* + * Count bytes remain + * dst points to (dst + tmp1) + */ + mov x0, count + sub dst, dst, tmp1 + b .Lfinalize +9: + /* + * 16 bytes remain + * dst is accurate + */ + mov x0, #16 + b .Lfinalize +10: + /* + * count is accurate + * dst is accurate + */ + mov x0, count + b .Lfinalize +11: + /* + *(count + tmp2) bytes remain + * dst points to the start of the remaining bytes + */ + add x0, count, tmp2 + b .Lfinalize +12: + /* + * (count + 128) bytes remain + * dst is accurate + */ + add x0, count, #128 + b .Lfinalize +13: + /* + * (count + 128) bytes remain + * dst is pre-biased to (dst + 16) + */ + add x0, count, #128 + add dst, dst, #16 +.Lfinalize: + /* + * Zeroize remaining destination-buffer + */ + mov count, x0 +20: + /* Zero remaining buffer space */ + strb wzr, [dst], #1 + subs count, count, #1 + b.ne 20b ret .previous diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S new file mode 100644 index 0000000..c07eea6 --- /dev/null +++ b/arch/arm64/lib/copy_template.S @@ -0,0 +1,212 @@ +/* + * Copyright (c) 2013, Applied Micro Circuits Corporation + * Copyright (c) 2012-2013, Linaro Limited + * + * Author: Feng Kan fkan@apm.com + * Author: Philipp Tomsich philipp.tomsich@theobroma-systems.com + * + * The code is adopted from the memcpy routine by Linaro Limited. + * + * This file is free software: you may copy, redistribute and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation, either version 2 of the License, or (at your + * option) any later version. + * + * This file is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + * + * This file incorporates work covered by the following copyright and + * permission notice: + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * 1 Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2 Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3 Neither the name of the Linaro nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#include <asm/assembler.h> + +dstin .req x0 +src .req x1 +count .req x2 +tmp1 .req x3 +tmp1w .req w3 +tmp2 .req x4 +tmp2w .req w4 +tmp3 .req x5 +tmp3w .req w5 +dst .req x6 + +A_l .req x7 +A_h .req x8 +B_l .req x9 +B_h .req x10 +C_l .req x11 +C_h .req x12 +D_l .req x13 +D_h .req x14 + + mov dst, dstin + cmp count, #64 + b.ge .Lcpy_not_short + cmp count, #15 + b.le .Ltail15tiny + + /* + * Deal with small copies quickly by dropping straight into the + * exit block. + */ +.Ltail63: + /* + * Copy up to 48 bytes of data. At this point we only need the + * bottom 6 bits of count to be accurate. + */ + ands tmp1, count, #0x30 + b.eq .Ltail15 + add dst, dst, tmp1 + add src, src, tmp1 + cmp tmp1w, #0x20 + b.eq 1f + b.lt 2f + USER(8f, ldp A_l, A_h, [src, #-48]) + USER(8f, stp A_l, A_h, [dst, #-48]) +1: + USER(8f, ldp A_l, A_h, [src, #-32]) + USER(8f, stp A_l, A_h, [dst, #-32]) +2: + USER(8f, ldp A_l, A_h, [src, #-16]) + USER(8f, stp A_l, A_h, [dst, #-16]) + +.Ltail15: + ands count, count, #15 + beq 1f + add src, src, count + USER(9f, ldp A_l, A_h, [src, #-16]) + add dst, dst, count + USER(9f, stp A_l, A_h, [dst, #-16]) +1: + b .Lsuccess + +.Ltail15tiny: + /* + * Copy up to 15 bytes of data. Does not assume additional data + * being copied. + */ + tbz count, #3, 1f + USER(10f, ldr tmp1, [src], #8) + USER(10f, str tmp1, [dst], #8) +1: + tbz count, #2, 1f + USER(10f, ldr tmp1w, [src], #4) + USER(10f, str tmp1w, [dst], #4) +1: + tbz count, #1, 1f + USER(10f, ldrh tmp1w, [src], #2) + USER(10f, strh tmp1w, [dst], #2) +1: + tbz count, #0, 1f + USER(10f, ldrb tmp1w, [src]) + USER(10f, strb tmp1w, [dst]) +1: + b .Lsuccess + +.Lcpy_not_short: + /* + * We don't much care about the alignment of DST, but we want SRC + * to be 128-bit (16 byte) aligned so that we don't cross cache line + * boundaries on both loads and stores. + */ + neg tmp2, src + ands tmp2, tmp2, #15 /* Bytes to reach alignment. */ + b.eq 2f + sub count, count, tmp2 + /* + * Copy more data than needed; it's faster than jumping + * around copying sub-Quadword quantities. We know that + * it can't overrun. + */ + USER(11f, ldp A_l, A_h, [src]) + add src, src, tmp2 + USER(11f, stp A_l, A_h, [dst]) + add dst, dst, tmp2 + /* There may be less than 63 bytes to go now. */ + cmp count, #63 + b.le .Ltail63 +2: + subs count, count, #128 + b.ge .Lcpy_body_large + /* + * Less than 128 bytes to copy, so handle 64 here and then jump + * to the tail. + */ + USER(12f, ldp A_l, A_h, [src]) + USER(12f, ldp B_l, B_h, [src, #16]) + USER(12f, ldp C_l, C_h, [src, #32]) + USER(12f, ldp D_l, D_h, [src, #48]) + USER(12f, stp A_l, A_h, [dst]) + USER(12f, stp B_l, B_h, [dst, #16]) + USER(12f, stp C_l, C_h, [dst, #32]) + USER(12f, stp D_l, D_h, [dst, #48]) + tst count, #0x3f + add src, src, #64 + add dst, dst, #64 + b.ne .Ltail63 + b .Lsuccess + + /* + * Critical loop. Start at a new cache line boundary. Assuming + * 64 bytes per line this ensures the entire loop is in one line. + */ + .p2align 6 +.Lcpy_body_large: + /* There are at least 128 bytes to copy. */ + USER(12f, ldp A_l, A_h, [src, #0]) + sub dst, dst, #16 /* Pre-bias. */ + USER(13f, ldp B_l, B_h, [src, #16]) + USER(13f, ldp C_l, C_h, [src, #32]) + USER(13f, ldp D_l, D_h, [src, #48]!) /* src += 64 - Pre-bias. */ +1: + USER(13f, stp A_l, A_h, [dst, #16]) + USER(13f, ldp A_l, A_h, [src, #16]) + USER(13f, stp B_l, B_h, [dst, #32]) + USER(13f, ldp B_l, B_h, [src, #32]) + USER(13f, stp C_l, C_h, [dst, #48]) + USER(13f, ldp C_l, C_h, [src, #48]) + USER(13f, stp D_l, D_h, [dst, #64]!) + USER(13f, ldp D_l, D_h, [src, #64]!) + subs count, count, #64 + b.ge 1b + USER(13f, stp A_l, A_h, [dst, #16]) + USER(13f, stp B_l, B_h, [dst, #32]) + USER(13f, stp C_l, C_h, [dst, #48]) + USER(13f, stp D_l, D_h, [dst, #64]) + add src, src, #16 + add dst, dst, #64 + 16 + tst count, #0x3f + b.ne .Ltail63 +.Lsuccess: + /* Nothing left to copy */ + mov x0, #0 + ret diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S index a0aeeb9..af24bfa 100644 --- a/arch/arm64/lib/copy_to_user.S +++ b/arch/arm64/lib/copy_to_user.S @@ -15,7 +15,6 @@ */
#include <linux/linkage.h> -#include <asm/assembler.h>
/* * Copy to user space from a kernel buffer (alignment handled by the hardware) @@ -28,34 +27,38 @@ * x0 - bytes not copied */ ENTRY(__copy_to_user) - add x4, x0, x2 // upper user buffer boundary - subs x2, x2, #8 - b.mi 2f -1: - ldr x3, [x1], #8 - subs x2, x2, #8 -USER(9f, str x3, [x0], #8 ) - b.pl 1b -2: adds x2, x2, #4 - b.mi 3f - ldr w3, [x1], #4 - sub x2, x2, #4 -USER(9f, str w3, [x0], #4 ) -3: adds x2, x2, #2 - b.mi 4f - ldrh w3, [x1], #2 - sub x2, x2, #2 -USER(9f, strh w3, [x0], #2 ) -4: adds x2, x2, #1 - b.mi 5f - ldrb w3, [x1] -USER(9f, strb w3, [x0] ) -5: mov x0, #0 - ret +#include "copy_template.S" ENDPROC(__copy_to_user)
.section .fixup,"ax" - .align 2 -9: sub x0, x4, x0 // bytes not copied + .align 2 +8: +10: + /* + * count is accurate + */ + mov x0, count + b .Lfinalize +9: + /* + * 16 bytes remain + */ + mov x0, #16 + b .Lfinalize +11: + /* + *(count + tmp2) bytes remain + * dst points to the start of the remaining bytes + */ + add x0, count, tmp2 + b .Lfinalize +12: +13: + /* + * (count + 128) bytes remain + */ + add x0, count, #128 + b .Lfinalize +.Lfinalize: ret .previous
From: Craig Magina craig.magina@canonical.com
Using the glibc cortex string work work authored by Linaro as base to create new copy to/from user kernel routine.
Iperf performance increase: -l (size) 1 core result Optimized 64B 44-51Mb/s 1500B 4.9Gb/s 30000B 16.2Gb/s Original 64B 34-50.7Mb/s 1500B 4.7Gb/s 30000B 14.5Gb/s
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1400349
Note there was one change I did to move around tst to be right next to the branch for better optimization for ThunderX.
Signed-off-by: Craig Magina craig.magina@canonical.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/lib/copy_template.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/lib/copy_template.S b/arch/arm64/lib/copy_template.S index c07eea6..bdce432 100644 --- a/arch/arm64/lib/copy_template.S +++ b/arch/arm64/lib/copy_template.S @@ -169,9 +169,9 @@ D_h .req x14 USER(12f, stp B_l, B_h, [dst, #16]) USER(12f, stp C_l, C_h, [dst, #32]) USER(12f, stp D_l, D_h, [dst, #48]) - tst count, #0x3f add src, src, #64 add dst, dst, #64 + tst count, #0x3f b.ne .Ltail63 b .Lsuccess
From: TIRUMALESH CHALAMARLA tchalamarla@cavium.com
Signed-off-by: TIRUMALESH CHALAMARLA tchalamarla@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/cache.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h index bde4499..5082b30 100644 --- a/arch/arm64/include/asm/cache.h +++ b/arch/arm64/include/asm/cache.h @@ -18,7 +18,7 @@
#include <asm/cachetype.h>
-#define L1_CACHE_SHIFT 6 +#define L1_CACHE_SHIFT 7 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
/*
From: Graeme Gregory graeme.gregory@linaro.org
This is a standard platform device to resources are converted in the ACPI core in the same fasion as DT resources. For the other DT provided information there is _DSD for ACPI.
Signed-off-by: Graeme Gregory graeme.gregory@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/smsc/smsc911x.c | 38 ++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 959aeea..9161f0a 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -31,6 +31,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/acpi.h> #include <linux/crc32.h> #include <linux/clk.h> #include <linux/delay.h> @@ -2412,9 +2413,36 @@ static inline int smsc911x_probe_config_dt( } #endif /* CONFIG_OF */
+#ifdef CONFIG_ACPI +/* Configure some sensible defaults for ACPI mode */ +static int smsc911x_probe_config_acpi(struct smsc911x_platform_config *config, + acpi_handle *ahandle) +{ + if (!ahandle) + return -ENOSYS; + + config->phy_interface = PHY_INTERFACE_MODE_MII; + + config->flags |= SMSC911X_USE_32BIT; + + config->irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH; + + config->irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL; + + return 0; +} +#else +static int smsc911x_probe_config_acpi(struct smsc911x_platform_config *config, + acpi_handle *ahandle) +{ + return -ENOSYS; +} +#endif /* CONFIG_ACPI */ + static int smsc911x_drv_probe(struct platform_device *pdev) { struct device_node *np = pdev->dev.of_node; + acpi_handle *ahandle = ACPI_HANDLE(&pdev->dev); struct net_device *dev; struct smsc911x_data *pdata; struct smsc911x_platform_config *config = dev_get_platdata(&pdev->dev); @@ -2479,6 +2507,9 @@ static int smsc911x_drv_probe(struct platform_device *pdev) }
retval = smsc911x_probe_config_dt(&pdata->config, np); + if (retval) + retval = smsc911x_probe_config_acpi(&pdata->config, ahandle); + if (retval && config) { /* copy config parameters across to pdata */ memcpy(&pdata->config, config, sizeof(pdata->config)); @@ -2654,6 +2685,12 @@ static const struct of_device_id smsc911x_dt_ids[] = { MODULE_DEVICE_TABLE(of, smsc911x_dt_ids); #endif
+static const struct acpi_device_id smsc911x_acpi_ids[] = { + { "LNRO001B", }, + { "ARMH9118", }, + { } +}; + static struct platform_driver smsc911x_driver = { .probe = smsc911x_drv_probe, .remove = smsc911x_drv_remove, @@ -2661,6 +2698,7 @@ static struct platform_driver smsc911x_driver = { .name = SMSC_CHIPNAME, .pm = SMSC911X_PM_OPS, .of_match_table = of_match_ptr(smsc911x_dt_ids), + .acpi_match_table = ACPI_PTR(smsc911x_acpi_ids), }, };
From: Graeme Gregory graeme.gregory@linaro.org
Add device ID LINA0003 for this device and add the match table.
As its a platform device it needs no other code and will be probed in by acpi_platform once device ID is added.
Signed-off-by: Graeme Gregory graeme.gregory@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/smsc/smc91x.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c index 630f0b7..10fcc67 100644 --- a/drivers/net/ethernet/smsc/smc91x.c +++ b/drivers/net/ethernet/smsc/smc91x.c @@ -65,6 +65,7 @@ static const char version[] = #endif
+#include <linux/acpi.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/sched.h> @@ -82,7 +83,6 @@ static const char version[] = #include <linux/of.h> #include <linux/of_device.h> #include <linux/of_gpio.h> - #include <linux/netdevice.h> #include <linux/etherdevice.h> #include <linux/skbuff.h> @@ -2463,6 +2463,14 @@ static struct dev_pm_ops smc_drv_pm_ops = { .resume = smc_drv_resume, };
+#ifdef CONFIG_ACPI +static const struct acpi_device_id smc91x_acpi_match[] = { + { "LNRO0003", }, + { } +}; +MODULE_DEVICE_TABLE(acpi, smc91x_acpi_match); +#endif + static struct platform_driver smc_driver = { .probe = smc_drv_probe, .remove = smc_drv_remove, @@ -2470,6 +2478,7 @@ static struct platform_driver smc_driver = { .name = CARDNAME, .pm = &smc_drv_pm_ops, .of_match_table = of_match_ptr(smc91x_match), + .acpi_match_table = ACPI_PTR(smc91x_acpi_match), }, };
From: Graeme Gregory graeme.gregory@linaro.org
Added the match table and pointers for ACPI probing to the driver.
Signed-off-by: Graeme Gregory graeme.gregory@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/virtio/virtio_mmio.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 10189b5..9f9f4c3 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -70,8 +70,7 @@ #include <linux/virtio_config.h> #include <linux/virtio_mmio.h> #include <linux/virtio_ring.h> - - +#include <linux/acpi.h>
/* The alignment to use between consumer and producer parts of vring. * Currently hardcoded to the page size. */ @@ -732,12 +731,21 @@ static struct of_device_id virtio_mmio_match[] = { }; MODULE_DEVICE_TABLE(of, virtio_mmio_match);
+#ifdef CONFIG_ACPI +static const struct acpi_device_id virtio_mmio_acpi_match[] = { + { "LNRO0005", }, + { } +}; +MODULE_DEVICE_TABLE(acpi, virtio_mmio_acpi_match); +#endif + static struct platform_driver virtio_mmio_driver = { .probe = virtio_mmio_probe, .remove = virtio_mmio_remove, .driver = { .name = "virtio-mmio", .of_match_table = virtio_mmio_match, + .acpi_match_table = ACPI_PTR(virtio_mmio_acpi_match), }, };
From: Robert Richter rrichter@cavium.com
This reverts commit f0bd7ccc413f6de0947d6b8e998ef1fb787513ff.
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/mfd/vexpress-sysreg.c | 71 ++++++++++++++++++++++++++++++++++--------- 1 file changed, 56 insertions(+), 15 deletions(-)
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c index 3e628df..8f43ab8 100644 --- a/drivers/mfd/vexpress-sysreg.c +++ b/drivers/mfd/vexpress-sysreg.c @@ -47,26 +47,71 @@ #define SYS_HBI_MASK 0xfff #define SYS_PROCIDx_HBI_SHIFT 0
+#define SYS_MCI_CARDIN (1 << 0) +#define SYS_MCI_WPROT (1 << 1) + #define SYS_MISC_MASTERSITE (1 << 14)
-void vexpress_flags_set(u32 data) -{ - static void __iomem *base;
- if (!base) { +static void __iomem *__vexpress_sysreg_base; + +static void __iomem *vexpress_sysreg_base(void) +{ + if (!__vexpress_sysreg_base) { struct device_node *node = of_find_compatible_node(NULL, NULL, "arm,vexpress-sysreg");
- base = of_iomap(node, 0); + __vexpress_sysreg_base = of_iomap(node, 0); }
- if (WARN_ON(!base)) - return; + WARN_ON(!__vexpress_sysreg_base); + + return __vexpress_sysreg_base; +} + + +static int vexpress_sysreg_get_master(void) +{ + if (readl(vexpress_sysreg_base() + SYS_MISC) & SYS_MISC_MASTERSITE) + return VEXPRESS_SITE_DB2; + + return VEXPRESS_SITE_DB1; +} + +void vexpress_flags_set(u32 data) +{ + writel(~0, vexpress_sysreg_base() + SYS_FLAGSCLR); + writel(data, vexpress_sysreg_base() + SYS_FLAGSSET); +} + +unsigned int vexpress_get_mci_cardin(struct device *dev) +{ + return readl(vexpress_sysreg_base() + SYS_MCI) & SYS_MCI_CARDIN; +} + +u32 vexpress_get_procid(int site) +{ + if (site == VEXPRESS_SITE_MASTER) + site = vexpress_sysreg_get_master();
- writel(~0, base + SYS_FLAGSCLR); - writel(data, base + SYS_FLAGSSET); + return readl(vexpress_sysreg_base() + (site == VEXPRESS_SITE_DB1 ? + SYS_PROCID0 : SYS_PROCID1)); }
+void __iomem *vexpress_get_24mhz_clock_base(void) +{ + return vexpress_sysreg_base() + SYS_24MHZ; +} + + +void __init vexpress_sysreg_early_init(void __iomem *base) +{ + __vexpress_sysreg_base = base; + + vexpress_config_set_master(vexpress_sysreg_get_master()); +} + + /* The sysreg block is just a random collection of various functions... */
static struct syscon_platform_data vexpress_sysreg_sys_id_pdata = { @@ -165,7 +210,6 @@ static int vexpress_sysreg_probe(struct platform_device *pdev) struct resource *mem; void __iomem *base; struct bgpio_chip *mmc_gpio_chip; - int master; u32 dt_hbi;
mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -176,14 +220,11 @@ static int vexpress_sysreg_probe(struct platform_device *pdev) if (!base) return -ENOMEM;
- master = readl(base + SYS_MISC) & SYS_MISC_MASTERSITE ? - VEXPRESS_SITE_DB2 : VEXPRESS_SITE_DB1; - vexpress_config_set_master(master); + vexpress_config_set_master(vexpress_sysreg_get_master());
/* Confirm board type against DT property, if available */ if (of_property_read_u32(of_root, "arm,hbi", &dt_hbi) == 0) { - u32 id = readl(base + (master == VEXPRESS_SITE_DB1 ? - SYS_PROCID0 : SYS_PROCID1)); + u32 id = vexpress_get_procid(VEXPRESS_SITE_MASTER); u32 hbi = (id >> SYS_PROCIDx_HBI_SHIFT) & SYS_HBI_MASK;
if (WARN_ON(dt_hbi != hbi))
From: Naresh Bhat naresh.bhat@linaro.org
Add match table and pointers for ACPI probing into vexpress-sysreg driver.
vexpress-sysreg is self-contained so it gets resources automatically being platform driver. However, while it is not probed, it still should provides base address so that other related drivers can take advantage of it. Make it possible and find resources based on device HID.
Signed-off-by: Naresh Bhat naresh.bhat@linaro.org Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/mfd/vexpress-sysreg.c | 66 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 63 insertions(+), 3 deletions(-)
diff --git a/drivers/mfd/vexpress-sysreg.c b/drivers/mfd/vexpress-sysreg.c index 8f43ab8..0db1746 100644 --- a/drivers/mfd/vexpress-sysreg.c +++ b/drivers/mfd/vexpress-sysreg.c @@ -22,6 +22,7 @@ #include <linux/slab.h> #include <linux/stat.h> #include <linux/vexpress.h> +#include <linux/acpi.h>
#define SYS_ID 0x000 #define SYS_SW 0x004 @@ -55,15 +56,66 @@
static void __iomem *__vexpress_sysreg_base;
+#ifdef CONFIG_ACPI +static acpi_status check_vexpress_resource(struct acpi_resource *res, + void *data) +{ + struct resource *vexpress_res = data; + + if (!acpi_dev_resource_memory(res, vexpress_res)) + pr_err("Failed to map vexpress memory resource\n"); + + __vexpress_sysreg_base = ioremap(vexpress_res->start, + resource_size(vexpress_res)); + if (__vexpress_sysreg_base) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static acpi_status find_vexpress_resource(acpi_handle handle, u32 lvl, + void *context, void **rv) +{ + struct resource *vexpress_res = context; + + acpi_walk_resources(handle, METHOD_NAME__CRS, + check_vexpress_resource, context); + + if (vexpress_res->flags) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static void acpi_vexpress_sysreg_base(void) +{ + struct resource vexpress_res; + + acpi_get_devices("LNRO0009", find_vexpress_resource, &vexpress_res, + NULL); +} +#else +static inline void acpi_vexpress_sysreg_base(void) +{ +} +#endif + static void __iomem *vexpress_sysreg_base(void) { - if (!__vexpress_sysreg_base) { - struct device_node *node = of_find_compatible_node(NULL, NULL, - "arm,vexpress-sysreg"); + struct device_node *node; + + if (__vexpress_sysreg_base) + goto ret;
+ node = of_find_compatible_node(NULL, NULL, "arm,vexpress-sysreg"); + if (node) { __vexpress_sysreg_base = of_iomap(node, 0); + goto ret; }
+ acpi_vexpress_sysreg_base(); + +ret: WARN_ON(!__vexpress_sysreg_base);
return __vexpress_sysreg_base; @@ -255,10 +307,18 @@ static const struct of_device_id vexpress_sysreg_match[] = { {}, };
+#ifdef CONFIG_ACPI +static const struct acpi_device_id vexpress_sysreg_acpi_match[] = { + { "LNRO0009", }, + { } +}; +#endif + static struct platform_driver vexpress_sysreg_driver = { .driver = { .name = "vexpress-sysreg", .of_match_table = vexpress_sysreg_match, + .acpi_match_table = ACPI_PTR(vexpress_sysreg_acpi_match), }, .probe = vexpress_sysreg_probe, };
From: Al Stone ahs3@redhat.com
Signed-off-by: Mark Salter msalter@redhat.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pnp/resource.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pnp/resource.c b/drivers/pnp/resource.c index f980ff7..01f55a4 100644 --- a/drivers/pnp/resource.c +++ b/drivers/pnp/resource.c @@ -315,6 +315,7 @@ static int pci_dev_uses_irq(struct pnp_dev *pnp, struct pci_dev *pci, progif = class & 0xff; class >>= 8;
+#ifdef HAVE_ARCH_PCI_GET_LEGACY_IDE_IRQ if (class == PCI_CLASS_STORAGE_IDE) { /* * Unless both channels are native-PCI mode only, @@ -328,6 +329,7 @@ static int pci_dev_uses_irq(struct pnp_dev *pnp, struct pci_dev *pci, return 1; } } +#endif /* HAVE_ARCH_PCI_GET_LEGACY_IDE_IRQ */
return 0; }
From: Al Stone ahs3@redhat.com
Arm allows for two possible architectural clock sources. One memory mapped and the other coprocessor based. If both timers exist, then the driver waits for both to be probed before registering a clocksource.
Commit c387f07e6205 ("clocksource: arm_arch_timer: Discard unavailable timers correctly") attempted to fix a hang occurring when one of the two possible timers had a device node, but was disabled. In that case, the second probe would never occur and the system would hang without a clocksource being registered.
Unfortunately, incorrect logic in that commit made things worse such that a hang would occur unless both timers had a device node and were enabled. This patch fixes the logic so that we don't wait to probe a second timer unless it exists and is enabled.
Signed-off-by: Mark Salter msalter@redhat.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/clocksource/arm_arch_timer.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 0aa135d..17ad6f4 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -672,10 +672,11 @@ arch_timer_needs_probing(int type, const struct of_device_id *matches) bool needs_probing = false;
dn = of_find_matching_node(NULL, matches); - if (dn && of_device_is_available(dn) && !(arch_timers_present & type)) - needs_probing = true; - of_node_put(dn); - + if (dn) { + if (dn && of_device_is_available(dn) && !(arch_timers_present & type)) + needs_probing = true; + of_node_put(dn); + } return needs_probing; }
From: Tomasz Nowicki tomasz.nowicki@linaro.org
There is no need to probe GICv2 and GICv3 sequentially. From now on, we know GIC version in advance. Note this patch does not break backward compatibility for machines which are compliant with ACPI spec. 5.1.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/acpi.h | 2 ++ arch/arm64/kernel/acpi.c | 31 +++++++++++++++++++++++++++++-- include/acpi/actbl1.h | 12 +++++++++++- 3 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index 406485e..a80f3af 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -47,6 +47,7 @@ typedef u64 phys_cpuid_t; extern int acpi_disabled; extern int acpi_noirq; extern int acpi_pci_disabled; +extern int acpi_gic_ver;
static inline void disable_acpi(void) { @@ -85,6 +86,7 @@ static inline void arch_fix_phys_package_id(int num, u32 slot) { } void __init acpi_init_cpus(void);
#else +#define acpi_gic_ver 0 static inline void acpi_init_cpus(void) { } #endif /* CONFIG_ACPI */
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index 19de753..26928c4 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -36,6 +36,8 @@ EXPORT_SYMBOL(acpi_disabled); int acpi_pci_disabled = 1; /* skip ACPI PCI scan and IRQ initialization */ EXPORT_SYMBOL(acpi_pci_disabled);
+int acpi_gic_ver; + static bool param_acpi_off __initdata; static bool param_acpi_force __initdata;
@@ -206,12 +208,27 @@ void __init acpi_boot_table_init(void) } }
+static int __init +gic_acpi_find_ver(struct acpi_subtable_header *header, + const unsigned long end) +{ + struct acpi_madt_generic_distributor *dist; + + dist = (struct acpi_madt_generic_distributor *)header; + + if (BAD_MADT_ENTRY(dist, end)) + return -EINVAL; + + acpi_gic_ver = dist->gic_version; + return 0; +} + void __init acpi_gic_init(void) { struct acpi_table_header *table; acpi_status status; acpi_size tbl_size; - int err; + int err, count;;
if (acpi_disabled) return; @@ -224,7 +241,17 @@ void __init acpi_gic_init(void) return; }
- err = gic_v2_acpi_init(table); + count = acpi_parse_entries(ACPI_SIG_MADT, + sizeof(struct acpi_table_madt), + gic_acpi_find_ver, table, + ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR, 0); + if (count <= 0) { + pr_info("Error during GICD entries parsing, assuming GICv2\n"); + acpi_gic_ver = ACPI_MADT_GIC_VER_V2; + } + + err = acpi_gic_ver < ACPI_MADT_GIC_VER_V3 ? + gic_v2_acpi_init(table) : -ENXIO; if (err) pr_err("Failed to initialize GIC IRQ controller");
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h index fcd5709..606f657 100644 --- a/include/acpi/actbl1.h +++ b/include/acpi/actbl1.h @@ -823,6 +823,16 @@ struct acpi_madt_generic_interrupt { #define ACPI_MADT_PERFORMANCE_IRQ_MODE (1<<1) /* 01: Performance Interrupt Mode */ #define ACPI_MADT_VGIC_IRQ_MODE (1<<2) /* 02: VGIC Maintenance Interrupt mode */
+enum acpi_madt_gic_ver_type +{ + ACPI_MADT_GIC_VER_UNKNOWN = 0, + ACPI_MADT_GIC_VER_V2 = 1, + ACPI_MADT_GIC_VER_V2m = 2, + ACPI_MADT_GIC_VER_V3 = 3, + ACPI_MADT_GIC_VER_V4 = 4, + ACPI_MADT_GIC_VER_RESERVED = 5 /* 15 and greater are reserved */ +}; + /* 12: Generic Distributor (ACPI 5.0 + ACPI 6.0 changes) */
struct acpi_madt_generic_distributor { @@ -831,7 +841,7 @@ struct acpi_madt_generic_distributor { u32 gic_id; u64 base_address; u32 global_irq_base; - u8 version; + u8 gic_version; u8 reserved2[3]; /* reserved - must be zero */ };
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 9c67bda..89dde54 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -790,6 +790,7 @@ static void gic_irq_domain_free(struct irq_domain *domain, unsigned int virq, }
static const struct irq_domain_ops gic_irq_domain_ops = { + .map = gic_irq_domain_map, .xlate = gic_irq_domain_xlate, .alloc = gic_irq_domain_alloc, .free = gic_irq_domain_free,
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Isolate hardware abstraction (FDT) code to gic_of_init(). Rest of the logic goes to gic_init_bases() and expects well defined data to initialize GIC properly. The same solution is used for GICv2 driver.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3.c | 100 +++++++++++++++++++++++++------------------ 1 file changed, 58 insertions(+), 42 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 89dde54..622a2f7 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -819,60 +819,24 @@ static void gicv3_check_capabilities(void) gic_check_capabilities(iidr, gicv3_errata, NULL); }
-static int __init gic_of_init(struct device_node *node, struct device_node *parent) +static int __init gic_init_bases(void __iomem *dist_base, + struct redist_region *rdist_regs, + u32 nr_redist_regions, + u64 redist_stride, + struct device_node *node) { - void __iomem *dist_base; - struct redist_region *rdist_regs; - u64 redist_stride; - u32 nr_redist_regions; u32 typer; u32 reg; int gic_irqs; int err; - int i; - - dist_base = of_iomap(node, 0); - if (!dist_base) { - pr_err("%s: unable to map gic dist registers\n", - node->full_name); - return -ENXIO; - }
reg = readl_relaxed(dist_base + GICD_PIDR2) & GIC_PIDR2_ARCH_MASK; if (reg != GIC_PIDR2_ARCH_GICv3 && reg != GIC_PIDR2_ARCH_GICv4) { pr_err("%s: no distributor detected, giving up\n", node->full_name); - err = -ENODEV; - goto out_unmap_dist; - } - - if (of_property_read_u32(node, "#redistributor-regions", &nr_redist_regions)) - nr_redist_regions = 1; - - rdist_regs = kzalloc(sizeof(*rdist_regs) * nr_redist_regions, GFP_KERNEL); - if (!rdist_regs) { - err = -ENOMEM; - goto out_unmap_dist; - } - - for (i = 0; i < nr_redist_regions; i++) { - struct resource res; - int ret; - - ret = of_address_to_resource(node, 1 + i, &res); - rdist_regs[i].redist_base = of_iomap(node, 1 + i); - if (ret || !rdist_regs[i].redist_base) { - pr_err("%s: couldn't map region %d\n", - node->full_name, i); - err = -ENODEV; - goto out_unmap_rdist; - } - rdist_regs[i].phys_base = res.start; + return -ENODEV; }
- if (of_property_read_u64(node, "redistributor-stride", &redist_stride)) - redist_stride = 0; - gic_data.dist_base = dist_base; gic_data.redist_regions = rdist_regs; gic_data.nr_redist_regions = nr_redist_regions; @@ -916,6 +880,57 @@ out_free: if (gic_data.domain) irq_domain_remove(gic_data.domain); free_percpu(gic_data.rdists.rdist); + return err; +} + +#ifdef CONFIG_OF +static int __init gic_of_init(struct device_node *node, struct device_node *parent) +{ + void __iomem *dist_base; + struct redist_region *rdist_regs; + u64 redist_stride; + u32 nr_redist_regions; + int err, i; + + dist_base = of_iomap(node, 0); + if (!dist_base) { + pr_err("%s: unable to map gic dist registers\n", + node->full_name); + return -ENXIO; + } + + if (of_property_read_u32(node, "#redistributor-regions", &nr_redist_regions)) + nr_redist_regions = 1; + + rdist_regs = kzalloc(sizeof(*rdist_regs) * nr_redist_regions, GFP_KERNEL); + if (!rdist_regs) { + err = -ENOMEM; + goto out_unmap_dist; + } + + for (i = 0; i < nr_redist_regions; i++) { + struct resource res; + int ret; + + ret = of_address_to_resource(node, 1 + i, &res); + rdist_regs[i].redist_base = of_iomap(node, 1 + i); + if (ret || !rdist_regs[i].redist_base) { + pr_err("%s: couldn't map region %d\n", + node->full_name, i); + err = -ENODEV; + goto out_unmap_rdist; + } + rdist_regs[i].phys_base = res.start; + } + + if (of_property_read_u64(node, "redistributor-stride", &redist_stride)) + redist_stride = 0; + + err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions, + redist_stride, node); + if (!err) + return 0; + out_unmap_rdist: for (i = 0; i < nr_redist_regions; i++) if (rdist_regs[i].redist_base) @@ -927,3 +942,4 @@ out_unmap_dist: }
IRQCHIP_DECLARE(gic_v3, "arm,gic-v3", gic_of_init); +#endif
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Obtain GICv3+ re-distributor base addresses from MADT subtable, check data integrity and call GICv3 init funtion. GIC drivers probe order: if MADT provides redistributors, try GICv3 driver, otherwise try GICv2.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/acpi.c | 4 +- drivers/irqchip/irq-gic-v3.c | 210 +++++++++++++++++++++++++++++++++++ include/linux/irqchip/arm-gic-acpi.h | 2 + 3 files changed, 215 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c index 26928c4..2f8ca43 100644 --- a/arch/arm64/kernel/acpi.c +++ b/arch/arm64/kernel/acpi.c @@ -251,9 +251,11 @@ void __init acpi_gic_init(void) }
err = acpi_gic_ver < ACPI_MADT_GIC_VER_V3 ? - gic_v2_acpi_init(table) : -ENXIO; + gic_v2_acpi_init(table) : + gic_v3_acpi_init(table); if (err) pr_err("Failed to initialize GIC IRQ controller");
+ early_acpi_os_unmap_memory((char *)table, tbl_size); } diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 622a2f7..db82724 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -22,10 +22,12 @@ #include <linux/of.h> #include <linux/of_address.h> #include <linux/of_irq.h> +#include <linux/acpi.h> #include <linux/percpu.h> #include <linux/slab.h>
#include <linux/irqchip/arm-gic-v3.h> +#include <linux/irqchip/arm-gic-acpi.h>
#include <asm/cputype.h> #include <asm/exception.h> @@ -943,3 +945,211 @@ out_unmap_dist:
IRQCHIP_DECLARE(gic_v3, "arm,gic-v3", gic_of_init); #endif + +#ifdef CONFIG_ACPI +static struct redist_region *redist_regs; +static u32 redist_regions; +static void __iomem *dist_base; + +static void __iomem * __init +gic_acpi_map_one_redist(u64 redist_base_address) +{ + void __iomem *redist_base; + u64 typer; + u32 reg; + + /* Map RD + SGI pages */ + redist_base = ioremap(redist_base_address, 2 * SZ_64K); + if (!redist_base) + return NULL; + + /* + * Map another two pages VLPI + reserved, if GIC support + * virtual LPI. + */ + reg = readl_relaxed(redist_base + GICR_PIDR2) & GIC_PIDR2_ARCH_MASK; + if (reg != 0x30 && reg != 0x40) { /* We're in trouble... */ + pr_warn("No redistributor present @%p\n", redist_base); + iounmap(redist_base); + return NULL; + } + + typer = readq_relaxed(redist_base + GICR_TYPER); + if (typer & GICR_TYPER_VLPIS) { + iounmap(redist_base); + redist_base = ioremap(redist_base_address, 4 * SZ_64K); + } + + return redist_base; +} + +static int __init +gic_acpi_register_redist(u64 redist_base_address, u64 size, int region) +{ + struct redist_region *redist_regs_new; + void __iomem *redist_base; + + redist_regs_new = krealloc(redist_regs, + sizeof(*redist_regs) * (redist_regions + 1), + GFP_KERNEL); + if (!redist_regs_new) { + pr_err("Couldn't allocate resource for GICR region\n"); + return -ENOMEM; + } + + redist_regs = redist_regs_new; + + /* + * Region contains a distinct set of GIC redistributors. Region size + * gives us all info we need to map redistributors properly. + * + * If it is not region, we assume to deal with one redistributor. + * Redistributor size is probeable and depends on GIC version: + * GICv3: RD + SGI pages + * GICv4: RD + SGI + VLPI + reserved pages + */ + if (region) + redist_base = ioremap(redist_base_address, size); + else + redist_base = gic_acpi_map_one_redist(redist_base_address); + + if (!redist_base) { + pr_err("Couldn't map GICR region @%lx\n", + (long int)redist_base_address); + return -ENOMEM; + } + + redist_regs[redist_regions].phys_base = redist_base_address; + redist_regs[redist_regions++].redist_base = redist_base; + return 0; +} + +static int __init +gic_acpi_parse_madt_cpu(struct acpi_subtable_header *header, + const unsigned long end) +{ + struct acpi_madt_generic_interrupt *processor; + + if (BAD_MADT_ENTRY(header, end)) + return -EINVAL; + + processor = (struct acpi_madt_generic_interrupt *)header; + if (!processor->gicr_base_address) + return -EINVAL; + + return gic_acpi_register_redist(processor->gicr_base_address, 0, 0); +} + +static int __init +gic_acpi_parse_madt_redist(struct acpi_subtable_header *header, + const unsigned long end) +{ + struct acpi_madt_generic_redistributor *redist; + + if (BAD_MADT_ENTRY(header, end)) + return -EINVAL; + + redist = (struct acpi_madt_generic_redistributor *)header; + if (!redist->base_address) + return -EINVAL; + + return gic_acpi_register_redist(redist->base_address, + redist->length, 1); +} + +static int __init +gic_acpi_parse_madt_distributor(struct acpi_subtable_header *header, + const unsigned long end) +{ + struct acpi_madt_generic_distributor *dist; + + dist = (struct acpi_madt_generic_distributor *)header; + + if (BAD_MADT_ENTRY(dist, end)) + return -EINVAL; + + dist_base = ioremap(dist->base_address, ACPI_GICV3_DIST_MEM_SIZE); + if (!dist_base) { + pr_err("Unable to map GICD registers\n"); + return -ENOMEM; + } + + return 0; +} + +int __init +gic_v3_acpi_init(struct acpi_table_header *table) +{ + int count, i, err = 0; + + /* Collect redistributor base addresses */ + count = acpi_parse_entries(ACPI_SIG_MADT, + sizeof(struct acpi_table_madt), + gic_acpi_parse_madt_redist, table, + ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR, 0); + if (!count) + pr_info("No valid GICR entries exist\n"); + else if (count < 0) { + pr_err("Error during GICR entries parsing\n"); + err = -EINVAL; + goto out_redist_unmap; + } else + goto madt_dist; + + /* + * There might be no GICR structure but we can still obtain + * redistributor collection from GICC subtables. + */ + count = acpi_parse_entries(ACPI_SIG_MADT, + sizeof(struct acpi_table_madt), + gic_acpi_parse_madt_cpu, table, + ACPI_MADT_TYPE_GENERIC_INTERRUPT, 0); + if (!count) { + pr_info("No valid GICC entries exist\n"); + return -EINVAL; + } else if (count < 0) { + pr_err("Error during GICC entries parsing\n"); + err = -EINVAL; + goto out_redist_unmap; + } + +madt_dist: + /* + * We assume to parse one distributor entry since ACPI 5.0 spec + * neither support multi-GIC instances nor cascade. + */ + count = acpi_parse_entries(ACPI_SIG_MADT, + sizeof(struct acpi_table_madt), + gic_acpi_parse_madt_distributor, table, + ACPI_MADT_TYPE_GENERIC_DISTRIBUTOR, 0); + if (count < 0) { + pr_err("Error during GICD entries parsing\n"); + err = -EINVAL; + goto out_redist_unmap; + } else if (!count) { + pr_err("No valid GICD entries exist\n"); + err = -EINVAL; + goto out_redist_unmap; + } else if (count > 1) { + pr_err("More than one GICD entry detected\n"); + err = -EINVAL; + goto out_redist_unmap; + } + + err = gic_init_bases(dist_base, redist_regs, redist_regions, 0, NULL); + if (err) + goto out_dist_unmap; + + irq_set_default_host(gic_data.domain); + return 0; + +out_dist_unmap: + iounmap(dist_base); +out_redist_unmap: + for (i = 0; i < redist_regions; i++) + if (redist_regs[i].redist_base) + iounmap(redist_regs[i].redist_base); + kfree(redist_regs); + return err; +} +#endif diff --git a/include/linux/irqchip/arm-gic-acpi.h b/include/linux/irqchip/arm-gic-acpi.h index de3419e..27e77d5 100644 --- a/include/linux/irqchip/arm-gic-acpi.h +++ b/include/linux/irqchip/arm-gic-acpi.h @@ -18,11 +18,13 @@ * from GIC spec. */ #define ACPI_GICV2_DIST_MEM_SIZE (SZ_4K) +#define ACPI_GICV3_DIST_MEM_SIZE (SZ_64K) #define ACPI_GIC_CPU_IF_MEM_SIZE (SZ_8K)
struct acpi_table_header;
int gic_v2_acpi_init(struct acpi_table_header *table); +int gic_v3_acpi_init(struct acpi_table_header *table); void acpi_gic_init(void); #else static inline void acpi_gic_init(void) { }
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 131 +++++++++++++++++++------------------ drivers/irqchip/irq-gic-v3.c | 4 +- include/linux/irqchip/arm-gic-v3.h | 4 +- 3 files changed, 71 insertions(+), 68 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index cb7f33d..3b7f352 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -99,7 +99,6 @@ struct its_device {
static LIST_HEAD(its_nodes); static DEFINE_SPINLOCK(its_lock); -static struct device_node *gic_root_node; static struct rdists *gic_rdists;
#define gic_data_rdist() (raw_cpu_ptr(gic_rdists->rdist)) @@ -888,8 +887,8 @@ static int its_alloc_tables(struct its_node *its) order); if (order >= MAX_ORDER) { order = MAX_ORDER - 1; - pr_warn("%s: Device Table too large, reduce its page order to %u\n", - its->msi_chip.of_node->full_name, order); + pr_warn("ITS: Device Table too large, reduce its page order to %u\n", + order); } }
@@ -898,8 +897,8 @@ static int its_alloc_tables(struct its_node *its) if (alloc_pages > GITS_BASER_PAGES_MAX) { alloc_pages = GITS_BASER_PAGES_MAX; order = get_order(GITS_BASER_PAGES_MAX * psz); - pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n", - its->msi_chip.of_node->full_name, order, alloc_pages); + pr_warn("ITS: Device Table too large, reduce its page order to %u (%u pages)\n", + order, alloc_pages); }
base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order); @@ -966,9 +965,8 @@ retry_baser: }
if (val != tmp) { - pr_err("ITS: %s: GITS_BASER%d doesn't stick: %lx %lx\n", - its->msi_chip.of_node->full_name, i, - (unsigned long) val, (unsigned long) tmp); + pr_err("ITS: GITS_BASER%d doesn't stick: %lx %lx\n", + i, (unsigned long)val, (unsigned long)tmp); err = -ENXIO; goto out_free; } @@ -1481,43 +1479,33 @@ static void its_check_capabilities(struct its_node *its) gic_check_capabilities(iidr, its_errata, its); }
-static int its_probe(struct device_node *node, struct irq_domain *parent) +static struct its_node *its_probe(unsigned long phys_base, unsigned long size) { - struct resource res; struct its_node *its; void __iomem *its_base; u32 val; u64 baser, tmp; int err;
- err = of_address_to_resource(node, 0, &res); - if (err) { - pr_warn("%s: no regs?\n", node->full_name); - return -ENXIO; - } - - its_base = ioremap(res.start, resource_size(&res)); + its_base = ioremap(phys_base, size); if (!its_base) { - pr_warn("%s: unable to map registers\n", node->full_name); - return -ENOMEM; + pr_warn("Unable to map registers\n"); + return NULL; }
val = readl_relaxed(its_base + GITS_PIDR2) & GIC_PIDR2_ARCH_MASK; if (val != 0x30 && val != 0x40) { - pr_warn("%s: no ITS detected, giving up\n", node->full_name); + pr_warn("No ITS detected, giving up\n"); err = -ENODEV; goto out_unmap; }
err = its_force_quiescent(its_base); if (err) { - pr_warn("%s: failed to quiesce, giving up\n", - node->full_name); + pr_warn("ITS: Failed to quiesce, giving up: %d\n", err); goto out_unmap; }
- pr_info("ITS: %s\n", node->full_name); - its = kzalloc(sizeof(*its), GFP_KERNEL); if (!its) { err = -ENOMEM; @@ -1528,8 +1516,7 @@ static int its_probe(struct device_node *node, struct irq_domain *parent) INIT_LIST_HEAD(&its->entry); INIT_LIST_HEAD(&its->its_device_list); its->base = its_base; - its->phys_base = res.start; - its->msi_chip.of_node = node; + its->phys_base = phys_base; its->ite_size = ((readl_relaxed(its_base + GITS_TYPER) >> 4) & 0xf) + 1;
its->cmd_base = kzalloc(ITS_CMD_QUEUE_SZ, GFP_KERNEL); @@ -1577,39 +1564,12 @@ static int its_probe(struct device_node *node, struct irq_domain *parent) writeq_relaxed(0, its->base + GITS_CWRITER); writel_relaxed(GITS_CTLR_ENABLE, its->base + GITS_CTLR);
- if (of_property_read_bool(its->msi_chip.of_node, "msi-controller")) { - its->domain = irq_domain_add_tree(NULL, &its_domain_ops, its); - if (!its->domain) { - err = -ENOMEM; - goto out_free_tables; - } - - its->domain->parent = parent; - - its->msi_chip.domain = pci_msi_create_irq_domain(node, - &its_pci_msi_domain_info, - its->domain); - if (!its->msi_chip.domain) { - err = -ENOMEM; - goto out_free_domains; - } - - err = of_pci_msi_chip_add(&its->msi_chip); - if (err) - goto out_free_domains; - } - spin_lock(&its_lock); list_add(&its->entry, &its_nodes); spin_unlock(&its_lock);
- return 0; + return its;
-out_free_domains: - if (its->msi_chip.domain) - irq_domain_remove(its->msi_chip.domain); - if (its->domain) - irq_domain_remove(its->domain); out_free_tables: its_free_tables(its); out_free_cmd: @@ -1618,8 +1578,8 @@ out_free_its: kfree(its); out_unmap: iounmap(its_base); - pr_err("ITS: failed probing %s (%d)\n", node->full_name, err); - return err; + pr_err("ITS: failed probing (%d)\n", err); + return NULL; }
static bool gic_rdists_supports_plpis(void) @@ -1641,31 +1601,72 @@ int its_cpu_init(void) return 0; }
+static int its_init_domain(struct device_node *node, struct its_node *its) +{ + its->domain = irq_domain_add_tree(NULL, &its_domain_ops, its); + if (!its->domain) + return -ENOMEM; + + its->msi_chip.domain = pci_msi_create_irq_domain(node, + &its_pci_msi_domain_info, + its->domain); + if (!its->msi_chip.domain) { + irq_domain_remove(its->domain); + return -ENOMEM; + } + + return 0; +} + static struct of_device_id its_device_id[] = { { .compatible = "arm,gic-v3-its", }, {}, };
-int its_init(struct device_node *node, struct rdists *rdists, - struct irq_domain *parent_domain) +void its_of_probe(struct device_node *node) { struct device_node *np; + struct its_node *its; + struct resource res;
for (np = of_find_matching_node(node, its_device_id); np; np = of_find_matching_node(np, its_device_id)) { - its_probe(np, parent_domain); + if (of_address_to_resource(np, 0, &res)) { + pr_warn("%s: no regs?\n", node->full_name); + continue; + } + + pr_info("ITS: %s\n", np->full_name); + its = its_probe(res.start, resource_size(&res)); + if (!its) + continue; + + its->msi_chip.of_node = np; + if (of_property_read_bool(its->msi_chip.of_node, "msi-controller")) { + if (its_init_domain(np, its)) + continue; + + of_pci_msi_chip_add(&its->msi_chip); + } } +} + +void its_init(struct rdists *rdists, struct irq_domain *domain) +{ + struct its_node *its;
- if (list_empty(&its_nodes)) { - pr_warn("ITS: No ITS available, not enabling LPIs\n"); - return -ENXIO; + if (list_empty(&its_nodes)) + pr_info("ITS: No ITS available, not enabling LPIs\n"); + + spin_lock(&its_lock); + list_for_each_entry(its, &its_nodes, entry) { + if (its->domain) + its->domain->parent = domain; } + spin_unlock(&its_lock);
gic_rdists = rdists; - gic_root_node = node;
its_alloc_lpi_tables(); its_lpi_init(rdists->id_bits); - - return 0; } diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index db82724..d78589c 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -869,7 +869,7 @@ static int __init gic_init_bases(void __iomem *dist_base, set_handle_irq(gic_handle_irq);
if (IS_ENABLED(CONFIG_ARM_GIC_V3_ITS) && gic_dist_supports_lpis()) - its_init(node, &gic_data.rdists, gic_data.domain); + its_init(&gic_data.rdists, gic_data.domain);
gic_smp_init(); gic_dist_init(); @@ -928,6 +928,8 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare if (of_property_read_u64(node, "redistributor-stride", &redist_stride)) redist_stride = 0;
+ its_of_probe(node); + err = gic_init_bases(dist_base, rdist_regs, nr_redist_regions, redist_stride, node); if (!err) diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 5bbd47c..2c8f0f5 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -399,8 +399,8 @@ static inline void gic_write_eoir(u64 irq)
struct irq_domain; int its_cpu_init(void); -int its_init(struct device_node *node, struct rdists *rdists, - struct irq_domain *domain); +void its_init(struct rdists *rdists, struct irq_domain *domain); +void its_of_probe(struct device_node *node);
typedef u32 (*its_pci_requester_id_t)(struct pci_dev *, u16); void set_its_pci_requester_id(its_pci_requester_id_t fn);
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/irqchip/irq-gic-v3-its.c | 34 ++++++++++++++++++++++++++++++++++ drivers/irqchip/irq-gic-v3.c | 2 ++ include/linux/irqchip/arm-gic-acpi.h | 1 + 3 files changed, 37 insertions(+)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 3b7f352..4814954 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -27,10 +27,12 @@ #include <linux/of_irq.h> #include <linux/of_pci.h> #include <linux/of_platform.h> +#include <linux/acpi.h> #include <linux/percpu.h> #include <linux/slab.h>
#include <linux/irqchip/arm-gic-v3.h> +#include <linux/irqchip/arm-gic-acpi.h>
#include <asm/cacheflush.h> #include <asm/cputype.h> @@ -1651,6 +1653,38 @@ void its_of_probe(struct device_node *node) } }
+#ifdef CONFIG_ACPI +static int __init +gic_acpi_parse_madt_its(struct acpi_subtable_header *header, + const unsigned long end) +{ + struct acpi_madt_generic_translator *its; + + if (BAD_MADT_ENTRY(header, end)) + return -EINVAL; + + its = (struct acpi_madt_generic_translator *)header; + + pr_info("ITS: ID: 0x%x\n", its->translation_id); + its_probe(its->base_address, 2 * SZ_64K); + return 0; +} + +void __init its_acpi_probe(struct acpi_table_header *table) +{ + int count; + + count = acpi_parse_entries(ACPI_SIG_MADT, + sizeof(struct acpi_table_madt), + gic_acpi_parse_madt_its, table, + ACPI_MADT_TYPE_GENERIC_TRANSLATOR, 0); + if (!count) + pr_info("No valid GIC ITS entries exist\n"); + else if (count < 0) + pr_err("Error during GIC ITS entries parsing\n"); +} +#endif + void its_init(struct rdists *rdists, struct irq_domain *domain) { struct its_node *its; diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index d78589c..d02ca93 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -1138,6 +1138,8 @@ madt_dist: goto out_redist_unmap; }
+ its_acpi_probe(table); + err = gic_init_bases(dist_base, redist_regs, redist_regions, 0, NULL); if (err) goto out_dist_unmap; diff --git a/include/linux/irqchip/arm-gic-acpi.h b/include/linux/irqchip/arm-gic-acpi.h index 27e77d5..a8a6ff5 100644 --- a/include/linux/irqchip/arm-gic-acpi.h +++ b/include/linux/irqchip/arm-gic-acpi.h @@ -26,6 +26,7 @@ struct acpi_table_header; int gic_v2_acpi_init(struct acpi_table_header *table); int gic_v3_acpi_init(struct acpi_table_header *table); void acpi_gic_init(void); +void its_acpi_probe(struct acpi_table_header *table); #else static inline void acpi_gic_init(void) { } #endif
From: Tomasz Nowicki tomasz.nowicki@linaro.org
This patch is the first step for MMCONFIG refactoring process.
Code that uses pci_mmcfg_lock will be moved to common file and become accessible for all architectures. pci_mmconfig_insert() cannot be moved so easily since it is mixing generic mmcfg code with x86 specific logic inside of mutual exclusive block guarded by pci_mmcfg_lock.
To get rid of that constraint we reorder actions as fallow: 1. mmconfig entry allocation can be done at first, does not need lock 2. insertion to iomem_resource has its own lock, no need to wrap it into mutex 3. insertion to mmconfig list can be done as the final stage in separate function (candidate for further factoring)
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/x86/pci/mmconfig-shared.c | 100 ++++++++++++++++++++++------------------- 1 file changed, 54 insertions(+), 46 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e..5707040 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -720,6 +720,39 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
+static int __init pci_mmconfig_inject(struct pci_mmcfg_region *cfg) +{ + struct pci_mmcfg_region *cfg_conflict; + int err = 0; + + mutex_lock(&pci_mmcfg_lock); + cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); + if (cfg_conflict) { + if (cfg_conflict->end_bus < cfg->end_bus) + pr_info(FW_INFO "MMCONFIG for " + "domain %04x [bus %02x-%02x] " + "only partially covers this bridge\n", + cfg_conflict->segment, cfg_conflict->start_bus, + cfg_conflict->end_bus); + err = -EEXIST; + goto out; + } + + if (pci_mmcfg_arch_map(cfg)) { + pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); + err = -ENOMEM; + goto out; + } else { + list_add_sorted(cfg); + pr_info("MMCONFIG at %pR (base %#lx)\n", + &cfg->res, (unsigned long)cfg->address); + + } +out: + mutex_unlock(&pci_mmcfg_lock); + return err; +} + /* Add MMCFG information for host bridges */ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr) @@ -731,66 +764,41 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) return -ENODEV;
- if (start > end) + if (start > end || !addr) return -EINVAL;
- mutex_lock(&pci_mmcfg_lock); - cfg = pci_mmconfig_lookup(seg, start); - if (cfg) { - if (cfg->end_bus < end) - dev_info(dev, FW_INFO - "MMCONFIG for " - "domain %04x [bus %02x-%02x] " - "only partially covers this bridge\n", - cfg->segment, cfg->start_bus, cfg->end_bus); - mutex_unlock(&pci_mmcfg_lock); - return -EEXIST; - } - - if (!addr) { - mutex_unlock(&pci_mmcfg_lock); - return -EINVAL; - } - rc = -EBUSY; cfg = pci_mmconfig_alloc(seg, start, end, addr); if (cfg == NULL) { dev_warn(dev, "fail to add MMCONFIG (out of memory)\n"); - rc = -ENOMEM; + return -ENOMEM; } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n", &cfg->res); - } else { - /* Insert resource if it's not in boot stage */ - if (pci_mmcfg_running_state) - tmp = insert_resource_conflict(&iomem_resource, - &cfg->res); - - if (tmp) { - dev_warn(dev, - "MMCONFIG %pR conflicts with " - "%s %pR\n", - &cfg->res, tmp->name, tmp); - } else if (pci_mmcfg_arch_map(cfg)) { - dev_warn(dev, "fail to map MMCONFIG %pR.\n", - &cfg->res); - } else { - list_add_sorted(cfg); - dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", - &cfg->res, (unsigned long)addr); - cfg = NULL; - rc = 0; - } + goto error; }
- if (cfg) { - if (cfg->res.parent) - release_resource(&cfg->res); - kfree(cfg); + /* Insert resource if it's not in boot stage */ + if (pci_mmcfg_running_state) + tmp = insert_resource_conflict(&iomem_resource, &cfg->res); + + if (tmp) { + dev_warn(dev, + "MMCONFIG %pR conflicts with %s %pR\n", + &cfg->res, tmp->name, tmp); + goto error; }
- mutex_unlock(&pci_mmcfg_lock); + rc = pci_mmconfig_inject(cfg); + if (rc) + goto error; + + return 0;
+error: + if (cfg->res.parent) + release_resource(&cfg->res); + kfree(cfg); return rc; }
From: Tomasz Nowicki tomasz.nowicki@linaro.org
MMCFG table seems to be architecture independent and it makes sense to share common code across all architectures. The ones that may need architectural specific actions have default prototype (__weak).
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/x86/include/asm/pci_x86.h | 29 ----- arch/x86/pci/acpi.c | 1 + arch/x86/pci/init.c | 1 + arch/x86/pci/mmconfig-shared.c | 200 +--------------------------------- arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/bus.c | 1 + drivers/acpi/mmconfig.c | 242 +++++++++++++++++++++++++++++++++++++++++ include/linux/mmconfig.h | 58 ++++++++++ include/linux/pci.h | 8 -- 11 files changed, 308 insertions(+), 235 deletions(-) create mode 100644 drivers/acpi/mmconfig.c create mode 100644 include/linux/mmconfig.h
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 164e3f8..763f19a 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -123,35 +123,6 @@ extern int __init pcibios_init(void); extern int pci_legacy_init(void); extern void pcibios_fixup_irqs(void);
-/* pci-mmconfig.c */ - -/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ -#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) - -struct pci_mmcfg_region { - struct list_head list; - struct resource res; - u64 address; - char __iomem *virt; - u16 segment; - u8 start_bus; - u8 end_bus; - char name[PCI_MMCFG_RESOURCE_NAME_LEN]; -}; - -extern int __init pci_mmcfg_arch_init(void); -extern void __init pci_mmcfg_arch_free(void); -extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); -extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); -extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, - phys_addr_t addr); -extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); -extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); - -extern struct list_head pci_mmcfg_list; - -#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) - /* * AMD Fam10h CPUs are buggy, and cannot access MMIO config space * on their northbrige except through the * %eax register. As such, you MUST diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index ff99117..1d3801f 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -4,6 +4,7 @@ #include <linux/irq.h> #include <linux/dmi.h> #include <linux/slab.h> +#include <linux/mmconfig.h> #include <asm/numa.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c index adb62aa..b4a55df 100644 --- a/arch/x86/pci/init.c +++ b/arch/x86/pci/init.c @@ -1,5 +1,6 @@ #include <linux/pci.h> #include <linux/init.h> +#include <linux/mmconfig.h> #include <asm/pci_x86.h> #include <asm/x86_init.h>
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 5707040..87e8b6c 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -18,6 +18,7 @@ #include <linux/slab.h> #include <linux/mutex.h> #include <linux/rculist.h> +#include <linux/mmconfig.h> #include <asm/e820.h> #include <asm/pci_x86.h> #include <asm/acpi.h> @@ -27,103 +28,6 @@ /* Indicate if the mmcfg resources have been placed into the resource table. */ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed; -static DEFINE_MUTEX(pci_mmcfg_lock); - -LIST_HEAD(pci_mmcfg_list); - -static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) -{ - if (cfg->res.parent) - release_resource(&cfg->res); - list_del(&cfg->list); - kfree(cfg); -} - -static void __init free_all_mmcfg(void) -{ - struct pci_mmcfg_region *cfg, *tmp; - - pci_mmcfg_arch_free(); - list_for_each_entry_safe(cfg, tmp, &pci_mmcfg_list, list) - pci_mmconfig_remove(cfg); -} - -static void list_add_sorted(struct pci_mmcfg_region *new) -{ - struct pci_mmcfg_region *cfg; - - /* keep list sorted by segment and starting bus number */ - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { - if (cfg->segment > new->segment || - (cfg->segment == new->segment && - cfg->start_bus >= new->start_bus)) { - list_add_tail_rcu(&new->list, &cfg->list); - return; - } - } - list_add_tail_rcu(&new->list, &pci_mmcfg_list); -} - -static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - struct resource *res; - - if (addr == 0) - return NULL; - - new = kzalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - new->address = addr; - new->segment = segment; - new->start_bus = start; - new->end_bus = end; - - res = &new->res; - res->start = addr + PCI_MMCFG_BUS_OFFSET(start); - res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; - res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; - snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, - "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); - res->name = new->name; - - return new; -} - -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - - new = pci_mmconfig_alloc(segment, start, end, addr); - if (new) { - mutex_lock(&pci_mmcfg_lock); - list_add_sorted(new); - mutex_unlock(&pci_mmcfg_lock); - - pr_info(PREFIX - "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " - "(base %#lx)\n", - segment, start, end, &new->res, (unsigned long)addr); - } - - return new; -} - -struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == segment && - cfg->start_bus <= bus && bus <= cfg->end_bus) - return cfg; - - return NULL; -}
static const char *__init pci_mmcfg_e7520(void) { @@ -543,7 +447,7 @@ static void __init pci_mmcfg_reject_broken(int early) } }
-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, +int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, struct acpi_mcfg_allocation *cfg) { int year; @@ -566,50 +470,6 @@ static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, return -EINVAL; }
-static int __init pci_parse_mcfg(struct acpi_table_header *header) -{ - struct acpi_table_mcfg *mcfg; - struct acpi_mcfg_allocation *cfg_table, *cfg; - unsigned long i; - int entries; - - if (!header) - return -EINVAL; - - mcfg = (struct acpi_table_mcfg *)header; - - /* how many config structures do we have */ - free_all_mmcfg(); - entries = 0; - i = header->length - sizeof(struct acpi_table_mcfg); - while (i >= sizeof(struct acpi_mcfg_allocation)) { - entries++; - i -= sizeof(struct acpi_mcfg_allocation); - } - if (entries == 0) { - pr_err(PREFIX "MMCONFIG has no entries\n"); - return -ENODEV; - } - - cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; - for (i = 0; i < entries; i++) { - cfg = &cfg_table[i]; - if (acpi_mcfg_check_entry(mcfg, cfg)) { - free_all_mmcfg(); - return -ENODEV; - } - - if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, - cfg->end_bus_number, cfg->address) == NULL) { - pr_warn(PREFIX "no memory for MCFG entries\n"); - free_all_mmcfg(); - return -ENOMEM; - } - } - - return 0; -} - #ifdef CONFIG_ACPI_APEI extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, void *data), void *data); @@ -720,39 +580,6 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
-static int __init pci_mmconfig_inject(struct pci_mmcfg_region *cfg) -{ - struct pci_mmcfg_region *cfg_conflict; - int err = 0; - - mutex_lock(&pci_mmcfg_lock); - cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); - if (cfg_conflict) { - if (cfg_conflict->end_bus < cfg->end_bus) - pr_info(FW_INFO "MMCONFIG for " - "domain %04x [bus %02x-%02x] " - "only partially covers this bridge\n", - cfg_conflict->segment, cfg_conflict->start_bus, - cfg_conflict->end_bus); - err = -EEXIST; - goto out; - } - - if (pci_mmcfg_arch_map(cfg)) { - pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); - err = -ENOMEM; - goto out; - } else { - list_add_sorted(cfg); - pr_info("MMCONFIG at %pR (base %#lx)\n", - &cfg->res, (unsigned long)cfg->address); - - } -out: - mutex_unlock(&pci_mmcfg_lock); - return err; -} - /* Add MMCFG information for host bridges */ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr) @@ -801,26 +628,3 @@ error: kfree(cfg); return rc; } - -/* Delete MMCFG information for host bridges */ -int pci_mmconfig_delete(u16 seg, u8 start, u8 end) -{ - struct pci_mmcfg_region *cfg; - - mutex_lock(&pci_mmcfg_lock); - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == seg && cfg->start_bus == start && - cfg->end_bus == end) { - list_del_rcu(&cfg->list); - synchronize_rcu(); - pci_mmcfg_arch_unmap(cfg); - if (cfg->res.parent) - release_resource(&cfg->res); - mutex_unlock(&pci_mmcfg_lock); - kfree(cfg); - return 0; - } - mutex_unlock(&pci_mmcfg_lock); - - return -ENOENT; -} diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 43984bc..d774672 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -12,6 +12,7 @@ #include <linux/pci.h> #include <linux/init.h> #include <linux/rcupdate.h> +#include <linux/mmconfig.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index bea5249..1209596 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/bitmap.h> #include <linux/rcupdate.h> +#include <linux/mmconfig.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 8321430..e32b8cd 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_ACPI_BUTTON) += button.o obj-$(CONFIG_ACPI_FAN) += fan.o obj-$(CONFIG_ACPI_VIDEO) += video.o obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o +obj-$(CONFIG_PCI_MMCONFIG) += mmconfig.o obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-y += container.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 513e7230e..f9a8e5e 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -41,6 +41,7 @@ #include <acpi/apei.h> #include <linux/dmi.h> #include <linux/suspend.h> +#include <linux/mmconfig.h>
#include "internal.h"
diff --git a/drivers/acpi/mmconfig.c b/drivers/acpi/mmconfig.c new file mode 100644 index 0000000..d62dccda --- /dev/null +++ b/drivers/acpi/mmconfig.c @@ -0,0 +1,242 @@ +/* + * Arch agnostic low-level direct PCI config space access via MMCONFIG + * + * Per-architecture code takes care of the mappings, region validation and + * accesses themselves. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/mutex.h> +#include <linux/rculist.h> +#include <linux/mmconfig.h> + +#define PREFIX "PCI: " + +static DEFINE_MUTEX(pci_mmcfg_lock); + +LIST_HEAD(pci_mmcfg_list); + +static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) +{ + if (cfg->res.parent) + release_resource(&cfg->res); + list_del(&cfg->list); + kfree(cfg); +} + +void __init free_all_mmcfg(void) +{ + struct pci_mmcfg_region *cfg, *tmp; + + pci_mmcfg_arch_free(); + list_for_each_entry_safe(cfg, tmp, &pci_mmcfg_list, list) + pci_mmconfig_remove(cfg); +} + +void list_add_sorted(struct pci_mmcfg_region *new) +{ + struct pci_mmcfg_region *cfg; + + /* keep list sorted by segment and starting bus number */ + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { + if (cfg->segment > new->segment || + (cfg->segment == new->segment && + cfg->start_bus >= new->start_bus)) { + list_add_tail_rcu(&new->list, &cfg->list); + return; + } + } + list_add_tail_rcu(&new->list, &pci_mmcfg_list); +} + +struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + struct resource *res; + + if (addr == 0) + return NULL; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + new->address = addr; + new->segment = segment; + new->start_bus = start; + new->end_bus = end; + + res = &new->res; + res->start = addr + PCI_MMCFG_BUS_OFFSET(start); + res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; + snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, + "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); + res->name = new->name; + + return new; +} + +struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + + new = pci_mmconfig_alloc(segment, start, end, addr); + if (new) { + mutex_lock(&pci_mmcfg_lock); + list_add_sorted(new); + mutex_unlock(&pci_mmcfg_lock); + + pr_info(PREFIX + "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " + "(base %#lx)\n", + segment, start, end, &new->res, (unsigned long)addr); + } + + return new; +} + +int __init pci_mmconfig_inject(struct pci_mmcfg_region *cfg) +{ + struct pci_mmcfg_region *cfg_conflict; + int err = 0; + + mutex_lock(&pci_mmcfg_lock); + cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); + if (cfg_conflict) { + if (cfg_conflict->end_bus < cfg->end_bus) + pr_info(FW_INFO "MMCONFIG for " + "domain %04x [bus %02x-%02x] " + "only partially covers this bridge\n", + cfg_conflict->segment, cfg_conflict->start_bus, + cfg_conflict->end_bus); + err = -EEXIST; + goto out; + } + + if (pci_mmcfg_arch_map(cfg)) { + pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); + err = -ENOMEM; + goto out; + } else { + list_add_sorted(cfg); + pr_info("MMCONFIG at %pR (base %#lx)\n", + &cfg->res, (unsigned long)cfg->address); + + } +out: + mutex_unlock(&pci_mmcfg_lock); + return err; +} + +struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == segment && + cfg->start_bus <= bus && bus <= cfg->end_bus) + return cfg; + + return NULL; +} + +int __init __weak acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg) +{ + return 0; +} + +int __init pci_parse_mcfg(struct acpi_table_header *header) +{ + struct acpi_table_mcfg *mcfg; + struct acpi_mcfg_allocation *cfg_table, *cfg; + unsigned long i; + int entries; + + if (!header) + return -EINVAL; + + mcfg = (struct acpi_table_mcfg *)header; + + /* how many config structures do we have */ + free_all_mmcfg(); + entries = 0; + i = header->length - sizeof(struct acpi_table_mcfg); + while (i >= sizeof(struct acpi_mcfg_allocation)) { + entries++; + i -= sizeof(struct acpi_mcfg_allocation); + } + if (entries == 0) { + pr_err(PREFIX "MMCONFIG has no entries\n"); + return -ENODEV; + } + + cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; + for (i = 0; i < entries; i++) { + cfg = &cfg_table[i]; + if (acpi_mcfg_check_entry(mcfg, cfg)) { + free_all_mmcfg(); + return -ENODEV; + } + + if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, + cfg->end_bus_number, cfg->address) == NULL) { + pr_warn(PREFIX "no memory for MCFG entries\n"); + free_all_mmcfg(); + return -ENOMEM; + } + } + + return 0; +} + +/* Delete MMCFG information for host bridges */ +int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +{ + struct pci_mmcfg_region *cfg; + + mutex_lock(&pci_mmcfg_lock); + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == seg && cfg->start_bus == start && + cfg->end_bus == end) { + list_del_rcu(&cfg->list); + synchronize_rcu(); + pci_mmcfg_arch_unmap(cfg); + if (cfg->res.parent) + release_resource(&cfg->res); + mutex_unlock(&pci_mmcfg_lock); + kfree(cfg); + return 0; + } + mutex_unlock(&pci_mmcfg_lock); + + return -ENOENT; +} + +void __init __weak pci_mmcfg_early_init(void) +{ + +} + +void __init __weak pci_mmcfg_late_init(void) +{ + struct pci_mmcfg_region *cfg; + + acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); + + if (list_empty(&pci_mmcfg_list)) + return; + + if (!pci_mmcfg_arch_init()) + free_all_mmcfg(); + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + insert_resource(&iomem_resource, &cfg->res); +} diff --git a/include/linux/mmconfig.h b/include/linux/mmconfig.h new file mode 100644 index 0000000..6ccd1ee --- /dev/null +++ b/include/linux/mmconfig.h @@ -0,0 +1,58 @@ +#ifndef __MMCONFIG_H +#define __MMCONFIG_H +#ifdef __KERNEL__ + +#include <linux/types.h> +#include <linux/acpi.h> + +#ifdef CONFIG_PCI_MMCONFIG +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) + +struct pci_mmcfg_region { + struct list_head list; + struct resource res; + u64 address; + char __iomem *virt; + u16 segment; + u8 start_bus; + u8 end_bus; + char name[PCI_MMCFG_RESOURCE_NAME_LEN]; +}; + +void pci_mmcfg_early_init(void); +void pci_mmcfg_late_init(void); +struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); + +int pci_parse_mcfg(struct acpi_table_header *header); +struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, + int end, u64 addr); +int pci_mmconfig_inject(struct pci_mmcfg_region *cfg); +struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr); +void list_add_sorted(struct pci_mmcfg_region *new); +int acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg); +void free_all_mmcfg(void); +int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, + phys_addr_t addr); +int pci_mmconfig_delete(u16 seg, u8 start, u8 end); + +/* Arch specific calls */ +int pci_mmcfg_arch_init(void); +void pci_mmcfg_arch_free(void); +int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); +void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); + +extern struct list_head pci_mmcfg_list; + +#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#else /* CONFIG_PCI_MMCONFIG */ +static inline void pci_mmcfg_late_init(void) { } +static inline void pci_mmcfg_early_init(void) { } +static inline void *pci_mmconfig_lookup(int segment, int bus) +{ return NULL; } +#endif /* CONFIG_PCI_MMCONFIG */ + +#endif /* __KERNEL__ */ +#endif /* __MMCONFIG_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index 1f1ce73..7474225 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1651,14 +1651,6 @@ void pcibios_penalize_isa_irq(int irq, int active); extern struct dev_pm_ops pcibios_pm_ops; #endif
-#ifdef CONFIG_PCI_MMCONFIG -void __init pci_mmcfg_early_init(void); -void __init pci_mmcfg_late_init(void); -#else -static inline void pci_mmcfg_early_init(void) { } -static inline void pci_mmcfg_late_init(void) { } -#endif - int pci_ext_cfg_avail(void);
void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar);
From: Tomasz Nowicki tomasz.nowicki@linaro.org
We are going to use mmio_config_{} name convention across all architectures. Currently it belongs to asm/pci_x86.h header which should be included only for x86 specific files. From now on, those accessors are in asm/pci.h header which can be included in non-architecture code much easier.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/x86/include/asm/pci.h | 42 +++++++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/pci_x86.h | 43 ------------------------------------------ 2 files changed, 42 insertions(+), 43 deletions(-)
diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index 4625943..0916d4b 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -71,6 +71,48 @@ void pcibios_set_master(struct pci_dev *dev); struct irq_routing_table *pcibios_get_irq_routing_table(void); int pcibios_set_irq_routing(struct pci_dev *dev, int pin, int irq);
+/* + * AMD Fam10h CPUs are buggy, and cannot access MMIO config space + * on their northbrige except through the * %eax register. As such, you MUST + * NOT use normal IOMEM accesses, you need to only use the magic mmio-config + * accessor functions. + * In fact just use pci_config_*, nothing else please. + */ +static inline unsigned char mmio_config_readb(void __iomem *pos) +{ + u8 val; + asm volatile("movb (%1),%%al" : "=a" (val) : "r" (pos)); + return val; +} + +static inline unsigned short mmio_config_readw(void __iomem *pos) +{ + u16 val; + asm volatile("movw (%1),%%ax" : "=a" (val) : "r" (pos)); + return val; +} + +static inline unsigned int mmio_config_readl(void __iomem *pos) +{ + u32 val; + asm volatile("movl (%1),%%eax" : "=a" (val) : "r" (pos)); + return val; +} + +static inline void mmio_config_writeb(void __iomem *pos, u8 val) +{ + asm volatile("movb %%al,(%1)" : : "a" (val), "r" (pos) : "memory"); +} + +static inline void mmio_config_writew(void __iomem *pos, u16 val) +{ + asm volatile("movw %%ax,(%1)" : : "a" (val), "r" (pos) : "memory"); +} + +static inline void mmio_config_writel(void __iomem *pos, u32 val) +{ + asm volatile("movl %%eax,(%1)" : : "a" (val), "r" (pos) : "memory"); +}
#define HAVE_PCI_MMAP extern int pci_mmap_page_range(struct pci_dev *dev, struct vm_area_struct *vma, diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 763f19a..c66deeb 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -123,49 +123,6 @@ extern int __init pcibios_init(void); extern int pci_legacy_init(void); extern void pcibios_fixup_irqs(void);
-/* - * AMD Fam10h CPUs are buggy, and cannot access MMIO config space - * on their northbrige except through the * %eax register. As such, you MUST - * NOT use normal IOMEM accesses, you need to only use the magic mmio-config - * accessor functions. - * In fact just use pci_config_*, nothing else please. - */ -static inline unsigned char mmio_config_readb(void __iomem *pos) -{ - u8 val; - asm volatile("movb (%1),%%al" : "=a" (val) : "r" (pos)); - return val; -} - -static inline unsigned short mmio_config_readw(void __iomem *pos) -{ - u16 val; - asm volatile("movw (%1),%%ax" : "=a" (val) : "r" (pos)); - return val; -} - -static inline unsigned int mmio_config_readl(void __iomem *pos) -{ - u32 val; - asm volatile("movl (%1),%%eax" : "=a" (val) : "r" (pos)); - return val; -} - -static inline void mmio_config_writeb(void __iomem *pos, u8 val) -{ - asm volatile("movb %%al,(%1)" : : "a" (val), "r" (pos) : "memory"); -} - -static inline void mmio_config_writew(void __iomem *pos, u16 val) -{ - asm volatile("movw %%ax,(%1)" : : "a" (val), "r" (pos) : "memory"); -} - -static inline void mmio_config_writel(void __iomem *pos, u32 val) -{ - asm volatile("movl %%eax,(%1)" : : "a" (val), "r" (pos) : "memory"); -} - #ifdef CONFIG_PCI # ifdef CONFIG_ACPI # define x86_default_pci_init pci_acpi_init
From: Tomasz Nowicki tomasz.nowicki@linaro.org
mmconfig_64.c version is going to be default implementation for arch agnostic low-level direct PCI config space accessors via MMCONFIG. However, now it initialize raw_pci_ext_ops pointer which is used in x86 specific code only. Moreover, mmconfig_32.c is doing the same thing at the same time.
Move it to mmconfig_shared.c so it becomes common for both and mmconfig_64.c turns out to be purely arch agnostic.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/x86/pci/mmconfig-shared.c | 10 ++++++++-- arch/x86/pci/mmconfig_32.c | 10 ++-------- arch/x86/pci/mmconfig_64.c | 6 ++---- include/linux/mmconfig.h | 4 ++++ 4 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index 87e8b6c..5acb006 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -29,6 +29,11 @@ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed;
+const struct pci_raw_ops pci_mmcfg = { + .read = pci_mmcfg_read, + .write = pci_mmcfg_write, +}; + static const char *__init pci_mmcfg_e7520(void) { u32 win; @@ -512,9 +517,10 @@ static void __init __pci_mmcfg_init(int early) } }
- if (pci_mmcfg_arch_init()) + if (pci_mmcfg_arch_init()) { + raw_pci_ext_ops = &pci_mmcfg; pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF; - else { + } else { free_all_mmcfg(); pci_mmcfg_arch_init_failed = true; } diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index d774672..c0106a6 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -50,7 +50,7 @@ static void pci_exp_set_dev_base(unsigned int base, int bus, int devfn) } }
-static int pci_mmcfg_read(unsigned int seg, unsigned int bus, +int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { unsigned long flags; @@ -89,7 +89,7 @@ err: *value = -1; return 0; }
-static int pci_mmcfg_write(unsigned int seg, unsigned int bus, +int pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) { unsigned long flags; @@ -126,15 +126,9 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return 0; }
-const struct pci_raw_ops pci_mmcfg = { - .read = pci_mmcfg_read, - .write = pci_mmcfg_write, -}; - int __init pci_mmcfg_arch_init(void) { printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n"); - raw_pci_ext_ops = &pci_mmcfg; return 1; }
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index 1209596..ff2c50c 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -25,7 +25,7 @@ static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned i return NULL; }
-static int pci_mmcfg_read(unsigned int seg, unsigned int bus, +int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { char __iomem *addr; @@ -59,7 +59,7 @@ err: *value = -1; return 0; }
-static int pci_mmcfg_write(unsigned int seg, unsigned int bus, +int pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) { char __iomem *addr; @@ -121,8 +121,6 @@ int __init pci_mmcfg_arch_init(void) return 0; }
- raw_pci_ext_ops = &pci_mmcfg; - return 1; }
diff --git a/include/linux/mmconfig.h b/include/linux/mmconfig.h index 6ccd1ee..ae8ec83 100644 --- a/include/linux/mmconfig.h +++ b/include/linux/mmconfig.h @@ -43,6 +43,10 @@ int pci_mmcfg_arch_init(void); void pci_mmcfg_arch_free(void); int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); +int pci_mmcfg_read(unsigned int seg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *value); +int pci_mmcfg_write(unsigned int seg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 value);
extern struct list_head pci_mmcfg_list;
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Note that x86 32bits machines still have its own low-level direct PCI config space accessors.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/x86/pci/Makefile | 5 +- arch/x86/pci/mmconfig_64.c | 152 --------------------------------------------- drivers/acpi/mmconfig.c | 134 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 138 insertions(+), 153 deletions(-) delete mode 100644 arch/x86/pci/mmconfig_64.c
diff --git a/arch/x86/pci/Makefile b/arch/x86/pci/Makefile index 5c6fc35..35c765b 100644 --- a/arch/x86/pci/Makefile +++ b/arch/x86/pci/Makefile @@ -1,7 +1,10 @@ obj-y := i386.o init.o
obj-$(CONFIG_PCI_BIOS) += pcbios.o -obj-$(CONFIG_PCI_MMCONFIG) += mmconfig_$(BITS).o direct.o mmconfig-shared.o +obj-$(CONFIG_PCI_MMCONFIG) += direct.o mmconfig-shared.o +ifeq ($(BITS),32) +obj-$(CONFIG_PCI_MMCONFIG) += mmconfig_32.o +endif obj-$(CONFIG_PCI_DIRECT) += direct.o obj-$(CONFIG_PCI_OLPC) += olpc.o obj-$(CONFIG_PCI_XEN) += xen.o diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c deleted file mode 100644 index ff2c50c..0000000 --- a/arch/x86/pci/mmconfig_64.c +++ /dev/null @@ -1,152 +0,0 @@ -/* - * mmconfig.c - Low-level direct PCI config space access via MMCONFIG - * - * This is an 64bit optimized version that always keeps the full mmconfig - * space mapped. This allows lockless config space operation. - */ - -#include <linux/pci.h> -#include <linux/init.h> -#include <linux/acpi.h> -#include <linux/bitmap.h> -#include <linux/rcupdate.h> -#include <linux/mmconfig.h> -#include <asm/e820.h> -#include <asm/pci_x86.h> - -#define PREFIX "PCI: " - -static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn) -{ - struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus); - - if (cfg && cfg->virt) - return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12)); - return NULL; -} - -int pci_mmcfg_read(unsigned int seg, unsigned int bus, - unsigned int devfn, int reg, int len, u32 *value) -{ - char __iomem *addr; - - /* Why do we have this when nobody checks it. How about a BUG()!? -AK */ - if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) { -err: *value = -1; - return -EINVAL; - } - - rcu_read_lock(); - addr = pci_dev_base(seg, bus, devfn); - if (!addr) { - rcu_read_unlock(); - goto err; - } - - switch (len) { - case 1: - *value = mmio_config_readb(addr + reg); - break; - case 2: - *value = mmio_config_readw(addr + reg); - break; - case 4: - *value = mmio_config_readl(addr + reg); - break; - } - rcu_read_unlock(); - - return 0; -} - -int pci_mmcfg_write(unsigned int seg, unsigned int bus, - unsigned int devfn, int reg, int len, u32 value) -{ - char __iomem *addr; - - /* Why do we have this when nobody checks it. How about a BUG()!? -AK */ - if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) - return -EINVAL; - - rcu_read_lock(); - addr = pci_dev_base(seg, bus, devfn); - if (!addr) { - rcu_read_unlock(); - return -EINVAL; - } - - switch (len) { - case 1: - mmio_config_writeb(addr + reg, value); - break; - case 2: - mmio_config_writew(addr + reg, value); - break; - case 4: - mmio_config_writel(addr + reg, value); - break; - } - rcu_read_unlock(); - - return 0; -} - -const struct pci_raw_ops pci_mmcfg = { - .read = pci_mmcfg_read, - .write = pci_mmcfg_write, -}; - -static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) -{ - void __iomem *addr; - u64 start, size; - int num_buses; - - start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus); - num_buses = cfg->end_bus - cfg->start_bus + 1; - size = PCI_MMCFG_BUS_OFFSET(num_buses); - addr = ioremap_nocache(start, size); - if (addr) - addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus); - return addr; -} - -int __init pci_mmcfg_arch_init(void) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry(cfg, &pci_mmcfg_list, list) - if (pci_mmcfg_arch_map(cfg)) { - pci_mmcfg_arch_free(); - return 0; - } - - return 1; -} - -void __init pci_mmcfg_arch_free(void) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry(cfg, &pci_mmcfg_list, list) - pci_mmcfg_arch_unmap(cfg); -} - -int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg) -{ - cfg->virt = mcfg_ioremap(cfg); - if (!cfg->virt) { - pr_err(PREFIX "can't map MMCONFIG at %pR\n", &cfg->res); - return -ENOMEM; - } - - return 0; -} - -void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg) -{ - if (cfg && cfg->virt) { - iounmap(cfg->virt + PCI_MMCFG_BUS_OFFSET(cfg->start_bus)); - cfg->virt = NULL; - } -} diff --git a/drivers/acpi/mmconfig.c b/drivers/acpi/mmconfig.c index d62dccda..c0ad05f 100644 --- a/drivers/acpi/mmconfig.c +++ b/drivers/acpi/mmconfig.c @@ -12,14 +12,148 @@
#include <linux/mutex.h> #include <linux/rculist.h> +#include <linux/pci.h> #include <linux/mmconfig.h>
+#include <asm/pci.h> + #define PREFIX "PCI: "
static DEFINE_MUTEX(pci_mmcfg_lock);
LIST_HEAD(pci_mmcfg_list);
+static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, + unsigned int devfn) +{ + struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus); + + if (cfg && cfg->virt) + return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12)); + return NULL; +} + +int __weak pci_mmcfg_read(unsigned int seg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *value) +{ + char __iomem *addr; + + /* Why do we have this when nobody checks it. How about a BUG()!? -AK */ + if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) { +err: *value = -1; + return -EINVAL; + } + + rcu_read_lock(); + addr = pci_dev_base(seg, bus, devfn); + if (!addr) { + rcu_read_unlock(); + goto err; + } + + switch (len) { + case 1: + *value = mmio_config_readb(addr + reg); + break; + case 2: + *value = mmio_config_readw(addr + reg); + break; + case 4: + *value = mmio_config_readl(addr + reg); + break; + } + rcu_read_unlock(); + + return 0; +} + +int __weak pci_mmcfg_write(unsigned int seg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 value) +{ + char __iomem *addr; + + /* Why do we have this when nobody checks it. How about a BUG()!? -AK */ + if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) + return -EINVAL; + + rcu_read_lock(); + addr = pci_dev_base(seg, bus, devfn); + if (!addr) { + rcu_read_unlock(); + return -EINVAL; + } + + switch (len) { + case 1: + mmio_config_writeb(addr + reg, value); + break; + case 2: + mmio_config_writew(addr + reg, value); + break; + case 4: + mmio_config_writel(addr + reg, value); + break; + } + rcu_read_unlock(); + + return 0; +} + +static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) +{ + void __iomem *addr; + u64 start, size; + int num_buses; + + start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus); + num_buses = cfg->end_bus - cfg->start_bus + 1; + size = PCI_MMCFG_BUS_OFFSET(num_buses); + addr = ioremap_nocache(start, size); + if (addr) + addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus); + return addr; +} + +int __init __weak pci_mmcfg_arch_init(void) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + if (pci_mmcfg_arch_map(cfg)) { + pci_mmcfg_arch_free(); + return 0; + } + + return 1; +} + +void __init __weak pci_mmcfg_arch_free(void) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + pci_mmcfg_arch_unmap(cfg); +} + +int __weak pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg) +{ + cfg->virt = mcfg_ioremap(cfg); + if (!cfg->virt) { + pr_err(PREFIX "can't map MMCONFIG at %pR\n", &cfg->res); + return -ENOMEM; + } + + return 0; +} + +void __weak pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg) +{ + if (cfg && cfg->virt) { + iounmap(cfg->virt + PCI_MMCFG_BUS_OFFSET(cfg->start_bus)); + cfg->virt = NULL; + } +} + static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) { if (cfg->res.parent)
From: Tomasz Nowicki tomasz.nowicki@linaro.org
MMCFG can be used perfectly for all architectures which support ACPI. ACPI mandates MMCFG to describe PCI config space ranges which means we should use MMCONFIG accessors by default.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/acpi/mmconfig.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/mmconfig.c b/drivers/acpi/mmconfig.c index c0ad05f..c9c6e05 100644 --- a/drivers/acpi/mmconfig.c +++ b/drivers/acpi/mmconfig.c @@ -23,6 +23,26 @@ static DEFINE_MUTEX(pci_mmcfg_lock);
LIST_HEAD(pci_mmcfg_list);
+/* + * raw_pci_read/write - ACPI PCI config space accessors. + * + * ACPI spec defines MMCFG as the way we can access PCI config space, + * so let MMCFG be default (__weak). + * + * If platform needs more fancy stuff, should provides its own implementation. + */ +int __weak raw_pci_read(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *val) +{ + return pci_mmcfg_read(domain, bus, devfn, reg, len, val); +} + +int __weak raw_pci_write(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 val) +{ + return pci_mmcfg_write(domain, bus, devfn, reg, len, val); +} + static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned int devfn) {
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/Kconfig | 3 +++ arch/arm64/include/asm/pci.h | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e32e427..e50c588 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -283,6 +283,9 @@ config PCI_DOMAINS_GENERIC config PCI_SYSCALL def_bool PCI
+config PCI_MMCONFIG + def_bool PCI && ACPI + source "drivers/pci/Kconfig" source "drivers/pci/pcie/Kconfig" source "drivers/pci/hotplug/Kconfig" diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h index ad3fb18..4e47457 100644 --- a/arch/arm64/include/asm/pci.h +++ b/arch/arm64/include/asm/pci.h @@ -27,6 +27,42 @@ extern int isa_dma_bridge_buggy;
#ifdef CONFIG_PCI + +#ifdef CONFIG_ACPI +/* + * ARM64 PCI config space access primitives. + */ +static inline unsigned char mmio_config_readb(void __iomem *pos) +{ + return readb(pos); +} + +static inline unsigned short mmio_config_readw(void __iomem *pos) +{ + return readw(pos); +} + +static inline unsigned int mmio_config_readl(void __iomem *pos) +{ + return readl(pos); +} + +static inline void mmio_config_writeb(void __iomem *pos, u8 val) +{ + writeb(val, pos); +} + +static inline void mmio_config_writew(void __iomem *pos, u16 val) +{ + writew(val, pos); +} + +static inline void mmio_config_writel(void __iomem *pos, u32 val) +{ + writel(val, pos); +} +#endif /* CONFIG_ACPI */ + static inline int pci_get_legacy_ide_irq(struct pci_dev *dev, int channel) { /* no legacy IRQ on arm64 */
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Host driver is made in old fashion, it should be revisited once new way of root bus creation get upstream.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/include/asm/pci.h | 8 + arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/pci-acpi.c | 348 +++++++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/pci.c | 24 --- drivers/pci/pci.c | 90 ++++++----- 5 files changed, 406 insertions(+), 65 deletions(-) create mode 100644 arch/arm64/kernel/pci-acpi.c
diff --git a/arch/arm64/include/asm/pci.h b/arch/arm64/include/asm/pci.h index 4e47457..76275e1 100644 --- a/arch/arm64/include/asm/pci.h +++ b/arch/arm64/include/asm/pci.h @@ -29,6 +29,14 @@ extern int isa_dma_bridge_buggy; #ifdef CONFIG_PCI
#ifdef CONFIG_ACPI +struct pci_controller { + struct acpi_device *companion; + int segment; + int node; /* nearest node with memory or NUMA_NO_NODE for global allocation */ +}; + +#define PCI_CONTROLLER(busdev) ((struct pci_controller *) busdev->sysdata) + /* * ARM64 PCI config space access primitives. */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 426d076..9add37b 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -35,6 +35,7 @@ arm64-obj-$(CONFIG_KGDB) += kgdb.o arm64-obj-$(CONFIG_EFI) += efi.o efi-stub.o efi-entry.o arm64-obj-$(CONFIG_PCI) += pci.o arm64-obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o +arm64-obj-$(CONFIG_ACPI) += pci-acpi.o arm64-obj-$(CONFIG_ACPI) += acpi.o
obj-y += $(arm64-obj-y) vdso/ diff --git a/arch/arm64/kernel/pci-acpi.c b/arch/arm64/kernel/pci-acpi.c new file mode 100644 index 0000000..1826b10 --- /dev/null +++ b/arch/arm64/kernel/pci-acpi.c @@ -0,0 +1,348 @@ +/* + * Code borrowed from powerpc/kernel/pci-common.c and arch/ia64/pci/pci.c + * + * Copyright (c) 2002, 2005 Hewlett-Packard Development Company, L.P. + * David Mosberger-Tang davidm@hpl.hp.com + * Bjorn Helgaas bjorn.helgaas@hp.com + * Copyright (C) 2004 Silicon Graphics, Inc. + * Copyright (C) 2003 Anton Blanchard anton@au.ibm.com, IBM + * Copyright (C) 2014 ARM Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 as published by the Free Software Foundation. + * + */ + +#include <linux/acpi.h> +#include <linux/init.h> +#include <linux/io.h> +#include <linux/kernel.h> +#include <linux/mm.h> +#include <linux/mmconfig.h> +#include <linux/of_address.h> +#include <linux/of_pci.h> +#include <linux/of_platform.h> +#include <linux/pci.h> +#include <linux/pci-acpi.h> +#include <linux/slab.h> + +#include <asm/pci-bridge.h> + +int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) +{ + ACPI_COMPANION_SET(&bridge->dev, + PCI_CONTROLLER(bridge->bus)->companion); + + return 0; +} + +void pcibios_add_bus(struct pci_bus *bus) +{ + acpi_pci_add_bus(bus); +} + +void pcibios_remove_bus(struct pci_bus *bus) +{ + acpi_pci_remove_bus(bus); +} + +int +pcibios_enable_device (struct pci_dev *dev, int mask) +{ + int ret; + + ret = pci_enable_resources(dev, mask); + if (ret < 0) + return ret; + + if (!dev->msi_enabled) + return acpi_pci_irq_enable(dev); + return 0; +} + +void +pcibios_disable_device (struct pci_dev *dev) +{ + BUG_ON(atomic_read(&dev->enable_cnt)); + if (!dev->msi_enabled) + acpi_pci_irq_disable(dev); +} + +static int pci_read(struct pci_bus *bus, unsigned int devfn, int where, + int size, u32 *value) +{ + return raw_pci_read(pci_domain_nr(bus), bus->number, + devfn, where, size, value); +} + +static int pci_write(struct pci_bus *bus, unsigned int devfn, int where, + int size, u32 value) +{ + return raw_pci_write(pci_domain_nr(bus), bus->number, + devfn, where, size, value); +} + +struct pci_ops pci_root_ops = { + .read = pci_read, + .write = pci_write, +}; + +static struct pci_controller *alloc_pci_controller(int seg) +{ + struct pci_controller *controller; + + controller = kzalloc(sizeof(*controller), GFP_KERNEL); + if (!controller) + return NULL; + + controller->segment = seg; + return controller; +} + +struct pci_root_info { + struct acpi_device *bridge; + struct pci_controller *controller; + struct list_head resources; + struct resource *res; + unsigned int res_num; + char *name; +}; + +static acpi_status resource_to_window(struct acpi_resource *resource, + struct acpi_resource_address64 *addr) +{ + acpi_status status; + + /* + * We're only interested in _CRS descriptors that are + * - address space descriptors for memory + * - non-zero size + * - producers, i.e., the address space is routed downstream, + * not consumed by the bridge itself + */ + status = acpi_resource_to_address64(resource, addr); + if (ACPI_SUCCESS(status) && + (addr->resource_type == ACPI_MEMORY_RANGE || + addr->resource_type == ACPI_IO_RANGE) && + addr->address.address_length && + addr->producer_consumer == ACPI_PRODUCER) + return AE_OK; + + return AE_ERROR; +} + +static acpi_status count_window(struct acpi_resource *resource, void *data) +{ + unsigned int *windows = (unsigned int *) data; + struct acpi_resource_address64 addr; + acpi_status status; + + status = resource_to_window(resource, &addr); + if (ACPI_SUCCESS(status)) + (*windows)++; + + return AE_OK; +} + +static acpi_status add_window(struct acpi_resource *res, void *data) +{ + struct pci_root_info *info = data; + struct resource *resource; + struct acpi_resource_address64 addr; + resource_size_t offset; + acpi_status status; + unsigned long flags; + struct resource *root; + u64 start; + + /* Return AE_OK for non-window resources to keep scanning for more */ + status = resource_to_window(res, &addr); + if (!ACPI_SUCCESS(status)) + return AE_OK; + + if (addr.resource_type == ACPI_MEMORY_RANGE) { + flags = IORESOURCE_MEM; + root = &iomem_resource; + } else if (addr.resource_type == ACPI_IO_RANGE) { + flags = IORESOURCE_IO; + root = &ioport_resource; + } else + return AE_OK; + + start = addr.address.minimum + addr.address.translation_offset; + + resource = &info->res[info->res_num]; + resource->name = info->name; + resource->flags = flags; + resource->start = start; + resource->end = resource->start + addr.address.address_length - 1; + + if (flags & IORESOURCE_IO) { + unsigned long port; + int err; + + err = pci_register_io_range(start, addr.address.address_length); + if (err) + return AE_OK; + + port = pci_address_to_pio(start); + if (port == (unsigned long)-1) + return AE_OK; + + resource->start = port; + resource->end = port + addr.address.address_length - 1; + + if (pci_remap_iospace(resource, start) < 0) + return AE_OK; + + offset = 0; + } else + offset = addr.address.translation_offset; + + if (insert_resource(root, resource)) { + dev_err(&info->bridge->dev, + "can't allocate host bridge window %pR\n", + resource); + } else { + if (addr.address.translation_offset) + dev_info(&info->bridge->dev, "host bridge window %pR " + "(PCI address [%#llx-%#llx])\n", + resource, + resource->start - addr.address.translation_offset, + resource->end - addr.address.translation_offset); + else + dev_info(&info->bridge->dev, + "host bridge window %pR\n", resource); + } + + pci_add_resource_offset(&info->resources, resource, offset); + info->res_num++; + return AE_OK; +} + +static void free_pci_root_info_res(struct pci_root_info *info) +{ + kfree(info->name); + kfree(info->res); + info->res = NULL; + info->res_num = 0; + kfree(info->controller); + info->controller = NULL; +} + +static void __release_pci_root_info(struct pci_root_info *info) +{ + int i; + struct resource *res; + + for (i = 0; i < info->res_num; i++) { + res = &info->res[i]; + + if (!res->parent) + continue; + + if (!(res->flags & (IORESOURCE_MEM | IORESOURCE_IO))) + continue; + + release_resource(res); + } + + free_pci_root_info_res(info); + kfree(info); +} + +static void release_pci_root_info(struct pci_host_bridge *bridge) +{ + struct pci_root_info *info = bridge->release_data; + + __release_pci_root_info(info); +} + +static int +probe_pci_root_info(struct pci_root_info *info, struct acpi_device *device, + int busnum, int domain) +{ + char *name; + + name = kmalloc(16, GFP_KERNEL); + if (!name) + return -ENOMEM; + + sprintf(name, "PCI Bus %04x:%02x", domain, busnum); + info->bridge = device; + info->name = name; + + acpi_walk_resources(device->handle, METHOD_NAME__CRS, count_window, + &info->res_num); + if (info->res_num) { + info->res = + kzalloc_node(sizeof(*info->res) * info->res_num, + GFP_KERNEL, info->controller->node); + if (!info->res) { + kfree(name); + return -ENOMEM; + } + + info->res_num = 0; + acpi_walk_resources(device->handle, METHOD_NAME__CRS, + add_window, info); + } else + kfree(name); + + return 0; +} + +/* Root bridge scanning */ +struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) +{ + struct acpi_device *device = root->device; + int domain = root->segment; + int bus = root->secondary.start; + struct pci_controller *controller; + struct pci_root_info *info = NULL; + int busnum = root->secondary.start; + struct pci_bus *pbus; + int ret; + + controller = alloc_pci_controller(domain); + if (!controller) + return NULL; + + controller->companion = device; + controller->node = acpi_get_node(device->handle); + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) { + dev_err(&device->dev, + "pci_bus %04x:%02x: ignored (out of memory)\n", + domain, busnum); + kfree(controller); + return NULL; + } + + info->controller = controller; + INIT_LIST_HEAD(&info->resources); + + ret = probe_pci_root_info(info, device, busnum, domain); + if (ret) { + kfree(info->controller); + kfree(info); + return NULL; + } + /* insert busn resource at first */ + pci_add_resource(&info->resources, &root->secondary); + + pbus = pci_create_root_bus(NULL, bus, &pci_root_ops, controller, + &info->resources); + if (!pbus) { + pci_free_resource_list(&info->resources); + __release_pci_root_info(info); + return NULL; + } + + pci_set_host_bridge_release(to_pci_host_bridge(pbus->bridge), + release_pci_root_info, info); + pci_scan_child_bus(pbus); + return pbus; +} diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 3356023..7642075 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -57,27 +57,3 @@ int pcibios_add_device(struct pci_dev *dev)
return 0; } - -/* - * raw_pci_read/write - Platform-specific PCI config space access. - */ -int raw_pci_read(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 *val) -{ - return -ENXIO; -} - -int raw_pci_write(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 val) -{ - return -ENXIO; -} - -#ifdef CONFIG_ACPI -/* Root bridge scanning */ -struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) -{ - /* TODO: Should be revisited when implementing PCI on ACPI */ - return NULL; -} -#endif diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0008c95..0eec993 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4486,47 +4486,55 @@ int pci_get_new_domain_nr(void) #ifdef CONFIG_PCI_DOMAINS_GENERIC void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) { - static int use_dt_domains = -1; - int domain = of_get_pci_domain_nr(parent->of_node); - - /* - * Check DT domain and use_dt_domains values. - * - * If DT domain property is valid (domain >= 0) and - * use_dt_domains != 0, the DT assignment is valid since this means - * we have not previously allocated a domain number by using - * pci_get_new_domain_nr(); we should also update use_dt_domains to - * 1, to indicate that we have just assigned a domain number from - * DT. - * - * If DT domain property value is not valid (ie domain < 0), and we - * have not previously assigned a domain number from DT - * (use_dt_domains != 1) we should assign a domain number by - * using the: - * - * pci_get_new_domain_nr() - * - * API and update the use_dt_domains value to keep track of method we - * are using to assign domain numbers (use_dt_domains = 0). - * - * All other combinations imply we have a platform that is trying - * to mix domain numbers obtained from DT and pci_get_new_domain_nr(), - * which is a recipe for domain mishandling and it is prevented by - * invalidating the domain value (domain = -1) and printing a - * corresponding error. - */ - if (domain >= 0 && use_dt_domains) { - use_dt_domains = 1; - } else if (domain < 0 && use_dt_domains != 1) { - use_dt_domains = 0; - domain = pci_get_new_domain_nr(); - } else { - dev_err(parent, "Node %s has inconsistent "linux,pci-domain" property in DT\n", - parent->of_node->full_name); - domain = -1; - } - - bus->domain_nr = domain; + static int use_dt_domains = -1; + int domain; + + if (!acpi_disabled) { + domain = PCI_CONTROLLER(bus)->segment; + goto out; + } + + domain = of_get_pci_domain_nr(parent->of_node); + + + /* + * Check DT domain and use_dt_domains values. + * + * If DT domain property is valid (domain >= 0) and + * use_dt_domains != 0, the DT assignment is valid since this means + * we have not previously allocated a domain number by using + * pci_get_new_domain_nr(); we should also update use_dt_domains to + * 1, to indicate that we have just assigned a domain number from + * DT. + * + * If DT domain property value is not valid (ie domain < 0), and we + * have not previously assigned a domain number from DT + * (use_dt_domains != 1) we should assign a domain number by + * using the: + * + * pci_get_new_domain_nr() + * + * API and update the use_dt_domains value to keep track of method we + * are using to assign domain numbers (use_dt_domains = 0). + * + * All other combinations imply we have a platform that is trying + * to mix domain numbers obtained from DT and pci_get_new_domain_nr(), + * which is a recipe for domain mishandling and it is prevented by + * invalidating the domain value (domain = -1) and printing a + * corresponding error. + */ + if (domain >= 0 && use_dt_domains) { + use_dt_domains = 1; + } else if (domain < 0 && use_dt_domains != 1) { + use_dt_domains = 0; + domain = pci_get_new_domain_nr(); + } else { + dev_err(parent, "Node %s has inconsistent "linux,pci-domain" property in DT\n", + parent->of_node->full_name); + domain = -1; + } +out: + bus->domain_nr = domain; } #endif #endif
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/phy/marvell.c | 118 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 99 insertions(+), 19 deletions(-)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index f721444..c07a5fc 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -33,6 +33,7 @@ #include <linux/phy.h> #include <linux/marvell_phy.h> #include <linux/of.h> +#include <linux/acpi.h>
#include <linux/io.h> #include <asm/irq.h> @@ -226,20 +227,6 @@ static int marvell_config_aneg(struct phy_device *phydev) }
#ifdef CONFIG_OF_MDIO -/* - * Set and/or override some configuration registers based on the - * marvell,reg-init property stored in the of_node for the phydev. - * - * marvell,reg-init = <reg-page reg mask value>,...; - * - * There may be one or more sets of <reg-page reg mask value>: - * - * reg-page: which register bank to use. - * reg: the register. - * mask: if non-zero, ANDed with existing register value. - * value: ORed with the masked value and written to the regiser. - * - */ static int marvell_of_reg_init(struct phy_device *phydev) { const __be32 *paddr; @@ -306,6 +293,99 @@ static int marvell_of_reg_init(struct phy_device *phydev) } #endif /* CONFIG_OF_MDIO */
+#ifdef CONFIG_ACPI +static int marvell_acpi_reg_init(struct phy_device *phydev) +{ + const union acpi_object *items; + const union acpi_object *obj; + int len, i, saved_page, current_page, page_changed, ret; + + ret = acpi_dev_get_property_array(ACPI_COMPANION(&phydev->dev), + "marvell,reg-init", ACPI_TYPE_ANY, &obj); + if (ret) + return 0; + + saved_page = phy_read(phydev, MII_MARVELL_PHY_PAGE); + if (saved_page < 0) + return saved_page; + page_changed = 0; + current_page = saved_page; + + items = obj->package.elements; + len = obj->package.count; + ret = 0; + for (i = 0; i < len - 3; i += 4) { + u16 reg_page = items[i].integer.value; + u16 reg = items[i + 1].integer.value; + u16 mask = items[i + 2].integer.value; + u16 val_bits = items[i + 3].integer.value; + int val; + + if (reg_page != current_page) { + current_page = reg_page; + page_changed = 1; + ret = phy_write(phydev, MII_MARVELL_PHY_PAGE, reg_page); + if (ret < 0) + goto err; + } + + val = 0; + if (mask) { + val = phy_read(phydev, reg); + if (val < 0) { + ret = val; + goto err; + } + val &= mask; + } + val |= val_bits; + + ret = phy_write(phydev, reg, val); + if (ret < 0) + goto err; + + } +err: + if (page_changed) { + i = phy_write(phydev, MII_MARVELL_PHY_PAGE, saved_page); + if (ret == 0) + ret = i; + } + return ret; +} +#else +static int marvell_acpi_reg_init(struct phy_device *phydev) +{ + return 0; +} +#endif /* CONFIG_ACPI */ + +/* + * Set and/or override some configuration registers based on the + * marvell,reg-init property stored in the of_node for the phydev. + * + * marvell,reg-init = <reg-page reg mask value>,...; + * + * There may be one or more sets of <reg-page reg mask value>: + * + * reg-page: which register bank to use. + * reg: the register. + * mask: if non-zero, ANDed with existing register value. + * value: ORed with the masked value and written to the regiser. + * + */ +static int marvell_reg_init(struct phy_device *phydev) +{ + int ret; + + if (phydev->dev.of_node) + ret = marvell_of_reg_init(phydev); + else + ret = marvell_acpi_reg_init(phydev); + + return ret; +} + static int m88e1121_config_aneg(struct phy_device *phydev) { int err, oldpage, mscr; @@ -390,7 +470,7 @@ static int m88e1510_config_aneg(struct phy_device *phydev) if (err < 0) return err;
- return marvell_of_reg_init(phydev); + return marvell_reg_init(phydev); }
static int m88e1116r_config_init(struct phy_device *phydev) @@ -552,7 +632,7 @@ static int m88e1111_config_init(struct phy_device *phydev) return err; }
- err = marvell_of_reg_init(phydev); + err = marvell_reg_init(phydev); if (err < 0) return err;
@@ -603,7 +683,7 @@ static int m88e1118_config_init(struct phy_device *phydev) if (err < 0) return err;
- err = marvell_of_reg_init(phydev); + err = marvell_reg_init(phydev); if (err < 0) return err;
@@ -629,7 +709,7 @@ static int m88e1149_config_init(struct phy_device *phydev) if (err < 0) return err;
- err = marvell_of_reg_init(phydev); + err = marvell_reg_init(phydev); if (err < 0) return err;
@@ -715,7 +795,7 @@ static int m88e1145_config_init(struct phy_device *phydev) return err; }
- err = marvell_of_reg_init(phydev); + err = marvell_reg_init(phydev); if (err < 0) return err;
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/phy/mdio-octeon.c | 62 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 61 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c index 428ae75..3ceb90b 100644 --- a/drivers/net/phy/mdio-octeon.c +++ b/drivers/net/phy/mdio-octeon.c @@ -14,6 +14,7 @@ #include <linux/gfp.h> #include <linux/phy.h> #include <linux/io.h> +#include <linux/acpi.h>
#ifdef CONFIG_CAVIUM_OCTEON_SOC #include <asm/octeon/octeon.h> @@ -265,6 +266,62 @@ static int octeon_mdiobus_write(struct mii_bus *bus, int phy_id, return 0; }
+#ifdef CONFIG_ACPI +static acpi_status +acpi_register_phy(acpi_handle handle, u32 lvl, void *context, void **rv) +{ + struct mii_bus *mdio = context; + struct acpi_device *adev; + struct phy_device *phy; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + return AE_OK; + + if (acpi_dev_prop_read_single(adev, "phy-channel", DEV_PROP_U32, + &phy_id)) + return AE_OK; + + phy = get_phy_device(mdio, phy_id, false); + if (!phy || IS_ERR(phy)) + return AE_OK; + + if (phy_device_register(phy)) + phy_device_free(phy); + + return AE_OK; +} + +static int +acpi_mdiobus_register(struct mii_bus *mdio) +{ + int i, ret; + + /* Mask out all PHYs from auto probing. */ + mdio->phy_mask = ~0; + + /* Clear all the IRQ properties */ + if (mdio->irq) + for (i = 0; i < PHY_MAX_ADDR; i++) + mdio->irq[i] = PHY_POLL; + + /* Register the MDIO bus */ + ret = mdiobus_register(mdio); + if (ret) + return ret; + + acpi_walk_namespace(ACPI_TYPE_DEVICE, ACPI_HANDLE(mdio->parent), 1, + acpi_register_phy, NULL, mdio, NULL); + return 0; +} +#else +static int +acpi_mdiobus_register(struct mii_bus *mdio) +{ + return 0; +} +#endif + static int octeon_mdiobus_probe(struct platform_device *pdev) { struct octeon_mdiobus *bus; @@ -317,7 +374,10 @@ static int octeon_mdiobus_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, bus);
- err = of_mdiobus_register(bus->mii_bus, pdev->dev.of_node); + if (pdev->dev.of_node) + err = of_mdiobus_register(bus->mii_bus, pdev->dev.of_node); + else + err = acpi_mdiobus_register(bus->mii_bus); if (err) goto fail_register;
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Additional functionality aims to find out which PHYs belong to which BGX instance in the ACPI way.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 92 ++++++++++++++++++++++- 1 file changed, 91 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 615b2af..5e1bebd 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -6,6 +6,7 @@ * as published by the Free Software Foundation. */
+#include <linux/acpi.h> #include <linux/module.h> #include <linux/interrupt.h> #include <linux/pci.h> @@ -835,6 +836,90 @@ static void bgx_get_qlm_mode(struct bgx *bgx) } }
+#ifdef CONFIG_ACPI + +static int bgx_match_phy_id(struct device *dev, void *data) +{ + struct phy_device *phydev = to_phy_device(dev); + u32 *phy_id = data; + + if (phydev->addr == *phy_id) + return 1; + + return 0; +} + +static acpi_status bgx_acpi_register_phy(acpi_handle handle, + u32 lvl, void *context, void **rv) +{ + struct acpi_reference_args args; + struct bgx *bgx = context; + struct acpi_device *adev; + struct device *phy_dev; + u32 phy_id; + + if (acpi_bus_get_device(handle, &adev)) + return AE_OK; + + if (acpi_dev_get_property_reference(adev, "phy-handle", 0, &args)) + return AE_OK; + + if (acpi_dev_prop_read_single(args.adev, "phy-channel", DEV_PROP_U32, + &phy_id)) + return AE_OK; + + phy_dev = bus_find_device(&mdio_bus_type, NULL, (void *)&phy_id, + bgx_match_phy_id); + if (!phy_dev) + return AE_OK; + + SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); + bgx->lmac[bgx->lmac_count].phydev = to_phy_device(phy_dev); + + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; + bgx->lmac_count++; + + return AE_OK; +} + +static acpi_status bgx_acpi_match_id(acpi_handle handle, u32 lvl, + void *context, void **ret_val) +{ + struct acpi_buffer string = { ACPI_ALLOCATE_BUFFER, NULL }; + struct bgx *bgx = context; + char bgx_sel[5]; + + snprintf(bgx_sel, 5, "BGX%d", bgx->bgx_id); + if (ACPI_FAILURE(acpi_get_name(handle, ACPI_SINGLE_NAME, &string))) { + pr_warn("Invalid link device\n"); + return AE_OK; + } + + if (strncmp(string.pointer, bgx_sel, 4)) + return AE_OK; + + acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1, + bgx_acpi_register_phy, NULL, bgx, NULL); + + kfree(string.pointer); + return AE_CTRL_TERMINATE; +} + +static int bgx_init_acpi_phy(struct bgx *bgx) +{ + acpi_get_devices(NULL, bgx_acpi_match_id, bgx, (void **)NULL); + return 0; +} + +#else + +static int bgx_init_acpi_phy(struct bgx *bgx) +{ + return -ENODEV; +} + +#endif /* CONFIG_ACPI */ + #if IS_ENABLED(CONFIG_OF_MDIO)
static int bgx_init_of_phy(struct bgx *bgx) @@ -882,7 +967,12 @@ static int bgx_init_of_phy(struct bgx *bgx)
static int bgx_init_phy(struct bgx *bgx) { - return bgx_init_of_phy(bgx); + int err = bgx_init_of_phy(bgx); + + if (err != -ENODEV) + return err; + + return bgx_init_acpi_phy(bgx); }
static int bgx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Some MCFG tables may be broken or the underlying hardware may not be fully compliant with the PCIe ECAM mechanism. This patch provides a mechanism to override the default mmconfig read/write routines and/or do other MCFG related fixups.
Signed-off-by: Mark Salter msalter@redhat.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/pci-acpi.c | 12 +++++ drivers/acpi/mmconfig.c | 103 ++++++++++++++++++++++++++------------ include/asm-generic/vmlinux.lds.h | 7 +++ include/linux/mmconfig.h | 24 +++++++++ 4 files changed, 115 insertions(+), 31 deletions(-)
diff --git a/arch/arm64/kernel/pci-acpi.c b/arch/arm64/kernel/pci-acpi.c index 1826b10..517d570 100644 --- a/arch/arm64/kernel/pci-acpi.c +++ b/arch/arm64/kernel/pci-acpi.c @@ -297,6 +297,7 @@ probe_pci_root_info(struct pci_root_info *info, struct acpi_device *device, struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) { struct acpi_device *device = root->device; + struct pci_mmcfg_region *mcfg; int domain = root->segment; int bus = root->secondary.start; struct pci_controller *controller; @@ -305,6 +306,17 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) struct pci_bus *pbus; int ret;
+ /* we need mmconfig */ + mcfg = pci_mmconfig_lookup(domain, busnum); + if (!mcfg) { + pr_err("pci_bus %04x:%02x has no MCFG table\n", + domain, busnum); + return NULL; + } + + if (mcfg->fixup) + (*mcfg->fixup)(root, mcfg); + controller = alloc_pci_controller(domain); if (!controller) return NULL; diff --git a/drivers/acpi/mmconfig.c b/drivers/acpi/mmconfig.c index c9c6e05..9c6efa5 100644 --- a/drivers/acpi/mmconfig.c +++ b/drivers/acpi/mmconfig.c @@ -43,20 +43,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return pci_mmcfg_write(domain, bus, devfn, reg, len, val); }
-static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, - unsigned int devfn) +static inline char __iomem *pci_dev_base(struct pci_mmcfg_region *cfg, + unsigned int bus, unsigned int devfn) { - struct pci_mmcfg_region *cfg = pci_mmconfig_lookup(seg, bus); + return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12)); +}
- if (cfg && cfg->virt) - return cfg->virt + (PCI_MMCFG_BUS_OFFSET(bus) | (devfn << 12)); - return NULL; +static int __pci_mmcfg_read(struct pci_mmcfg_region *cfg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *value) +{ + char __iomem *addr = pci_dev_base(cfg, bus, devfn); + + switch (len) { + case 1: + *value = mmio_config_readb(addr + reg); + break; + case 2: + *value = mmio_config_readw(addr + reg); + break; + case 4: + *value = mmio_config_readl(addr + reg); + break; + } + return 0; }
int __weak pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { - char __iomem *addr; + struct pci_mmcfg_region *cfg; + int ret;
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */ if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) { @@ -65,58 +81,66 @@ err: *value = -1; }
rcu_read_lock(); - addr = pci_dev_base(seg, bus, devfn); - if (!addr) { + cfg = pci_mmconfig_lookup(seg, bus); + if (!cfg || !cfg->virt) { rcu_read_unlock(); goto err; }
+ if (cfg->read) + ret = (*cfg->read)(cfg, bus, devfn, reg, len, value); + else + ret = __pci_mmcfg_read(cfg, bus, devfn, reg, len, value); + + rcu_read_unlock(); + + return ret; +} + +static int __pci_mmcfg_write(struct pci_mmcfg_region *cfg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 value) +{ + char __iomem *addr = pci_dev_base(cfg, bus, devfn); + switch (len) { case 1: - *value = mmio_config_readb(addr + reg); + mmio_config_writeb(addr + reg, value); break; case 2: - *value = mmio_config_readw(addr + reg); + mmio_config_writew(addr + reg, value); break; case 4: - *value = mmio_config_readl(addr + reg); + mmio_config_writel(addr + reg, value); break; } - rcu_read_unlock(); - return 0; }
int __weak pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) { - char __iomem *addr; + struct pci_mmcfg_region *cfg; + int ret;
/* Why do we have this when nobody checks it. How about a BUG()!? -AK */ if (unlikely((bus > 255) || (devfn > 255) || (reg > 4095))) return -EINVAL;
rcu_read_lock(); - addr = pci_dev_base(seg, bus, devfn); - if (!addr) { + cfg = pci_mmconfig_lookup(seg, bus); + if (!cfg || !cfg->virt) { rcu_read_unlock(); return -EINVAL; }
- switch (len) { - case 1: - mmio_config_writeb(addr + reg, value); - break; - case 2: - mmio_config_writew(addr + reg, value); - break; - case 4: - mmio_config_writel(addr + reg, value); - break; - } + if (cfg->write) + ret = (*cfg->write)(cfg, bus, devfn, reg, len, value); + else + ret = __pci_mmcfg_write(cfg, bus, devfn, reg, len, value); + rcu_read_unlock();
- return 0; + return ret; }
static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) @@ -307,10 +331,15 @@ int __init __weak acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, return 0; }
+extern struct acpi_mcfg_fixup __start_acpi_mcfg_fixups[]; +extern struct acpi_mcfg_fixup __end_acpi_mcfg_fixups[]; + int __init pci_parse_mcfg(struct acpi_table_header *header) { struct acpi_table_mcfg *mcfg; struct acpi_mcfg_allocation *cfg_table, *cfg; + struct acpi_mcfg_fixup *fixup; + struct pci_mmcfg_region *new; unsigned long i; int entries;
@@ -332,6 +361,15 @@ int __init pci_parse_mcfg(struct acpi_table_header *header) return -ENODEV; }
+ fixup = __start_acpi_mcfg_fixups; + while (fixup < __end_acpi_mcfg_fixups) { + if (!strncmp(fixup->oem_id, header->oem_id, 6) && + !strncmp(fixup->oem_table_id, header->oem_table_id, 8)) + break; + ++fixup; + } + + cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; for (i = 0; i < entries; i++) { cfg = &cfg_table[i]; @@ -340,12 +378,15 @@ int __init pci_parse_mcfg(struct acpi_table_header *header) return -ENODEV; }
- if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, - cfg->end_bus_number, cfg->address) == NULL) { + new = pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, + cfg->end_bus_number, cfg->address); + if (!new) { pr_warn(PREFIX "no memory for MCFG entries\n"); free_all_mmcfg(); return -ENOMEM; } + if (fixup < __end_acpi_mcfg_fixups) + new->fixup = fixup->hook; }
return 0; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 8bd374d..dd58f47 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -288,6 +288,13 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \ } \ \ + /* ACPI quirks */ \ + .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \ + VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \ + *(.acpi_fixup_mcfg) \ + VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \ + } \ + \ /* Built-in firmware blobs */ \ .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_builtin_fw) = .; \ diff --git a/include/linux/mmconfig.h b/include/linux/mmconfig.h index ae8ec83..4360e9a 100644 --- a/include/linux/mmconfig.h +++ b/include/linux/mmconfig.h @@ -9,9 +9,21 @@ /* "PCI MMCONFIG %04x [bus %02x-%02x]" */ #define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct acpi_pci_root; +struct pci_mmcfg_region; + +typedef int (*acpi_mcfg_fixup_t)(struct acpi_pci_root *root, + struct pci_mmcfg_region *cfg); + struct pci_mmcfg_region { struct list_head list; struct resource res; + int (*read)(struct pci_mmcfg_region *cfg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *value); + int (*write)(struct pci_mmcfg_region *cfg, unsigned int bus, + unsigned int devfn, int reg, int len, u32 value); + acpi_mcfg_fixup_t fixup; + void *data; u64 address; char __iomem *virt; u16 segment; @@ -20,6 +32,18 @@ struct pci_mmcfg_region { char name[PCI_MMCFG_RESOURCE_NAME_LEN]; };
+struct acpi_mcfg_fixup { + char oem_id[7]; + char oem_table_id[9]; + acpi_mcfg_fixup_t hook; +}; + +/* Designate a routine to fix up buggy MCFG */ +#define DECLARE_ACPI_MCFG_FIXUP(oem_id, table_id, hook) \ + static const struct acpi_mcfg_fixup __acpi_fixup_##hook __used \ + __attribute__((__section__(".acpi_fixup_mcfg"), aligned((sizeof(void *))))) \ + = { {oem_id}, {table_id}, hook }; + void pci_mmcfg_early_init(void); void pci_mmcfg_late_init(void); struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Fix for *Bug 14123* - "Error while trying to compile VNIC and BGX drivers as Loadable Kernel Modules"
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/acpi/property.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c index 7836e2e..635785d 100644 --- a/drivers/acpi/property.c +++ b/drivers/acpi/property.c @@ -432,6 +432,7 @@ int acpi_dev_prop_read_single(struct acpi_device *adev, const char *propname, } return ret; } +EXPORT_SYMBOL_GPL(acpi_dev_prop_read);
static int acpi_copy_property_array_u8(const union acpi_object *items, u8 *val, size_t nval)
From: Tomasz Nowicki tomasz.nowicki@linaro.org
With ACPI enabled, kvm_timer_hyp_init can't access any device tree information. Although registration of the virtual timer interrupt already happened when architected timers were initialized, we need to point KVM to the interrupt line used.
Signed-off-by: Alexander Spyridakis a.spyridakis@virtualopensystems.com Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- virt/kvm/arm/arch_timer.c | 76 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 60 insertions(+), 16 deletions(-)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c index 98c95f2..a26c8b8 100644 --- a/virt/kvm/arm/arch_timer.c +++ b/virt/kvm/arm/arch_timer.c @@ -21,6 +21,7 @@ #include <linux/kvm.h> #include <linux/kvm_host.h> #include <linux/interrupt.h> +#include <linux/acpi.h>
#include <clocksource/arm_arch_timer.h> #include <asm/arch_timer.h> @@ -274,15 +275,10 @@ static const struct of_device_id arch_timer_of_match[] = { {}, };
-int kvm_timer_hyp_init(void) +static int kvm_of_timer_hyp_init(unsigned int *ppi) { struct device_node *np; - unsigned int ppi; - int err; - - timecounter = arch_timer_get_timecounter(); - if (!timecounter) - return -ENODEV; + int err = 0;
np = of_find_matching_node(NULL, arch_timer_of_match); if (!np) { @@ -290,19 +286,70 @@ int kvm_timer_hyp_init(void) return -ENODEV; }
- ppi = irq_of_parse_and_map(np, 2); - if (!ppi) { + *ppi = irq_of_parse_and_map(np, 2); + if (!(*ppi)) { kvm_err("kvm_arch_timer: no virtual timer interrupt\n"); err = -EINVAL; - goto out; - } + } else + kvm_info("%s IRQ%d\n", np->name, *ppi); + + of_node_put(np); + return err; +} + +#ifdef CONFIG_ACPI +static struct acpi_table_gtdt *gtdt_acpi; + +static int arch_timer_acpi_parse(struct acpi_table_header *table) +{ + gtdt_acpi = (struct acpi_table_gtdt *)table; + return 0; +} + +static int kvm_acpi_timer_hyp_init(unsigned int *ppi) +{ + /* The virtual timer interrupt was already + * registered during initialization with ACPI. + * Get the interrupt number from the tables + * and point there. + */ + acpi_table_parse(ACPI_SIG_GTDT, arch_timer_acpi_parse); + if (!gtdt_acpi) + return -ENODEV; + if (!gtdt_acpi->virtual_timer_interrupt) + return -EINVAL; + + *ppi = gtdt_acpi->virtual_timer_interrupt; + kvm_info("timer IRQ%d\n", *ppi); + return 0; +} +#else +static int kvm_acpi_timer_hyp_init(unsigned int *ppi) +{ + return -ENODEV; +} +#endif + +int kvm_timer_hyp_init(void) +{ + unsigned int ppi; + int err; + + timecounter = arch_timer_get_timecounter(); + if (!timecounter) + return -ENODEV; + + err = acpi_disabled ? kvm_of_timer_hyp_init(&ppi) : + kvm_acpi_timer_hyp_init(&ppi); + if (err) + return err;
err = request_percpu_irq(ppi, kvm_arch_timer_handler, "kvm guest timer", kvm_get_running_vcpus()); if (err) { kvm_err("kvm_arch_timer: can't request interrupt %d (%d)\n", ppi, err); - goto out; + return err; }
host_vtimer_irq = ppi; @@ -319,14 +366,11 @@ int kvm_timer_hyp_init(void) goto out_free; }
- kvm_info("%s IRQ%d\n", np->name, ppi); on_each_cpu(kvm_timer_init_interrupt, NULL, 1);
- goto out; + return 0; out_free: free_percpu_irq(ppi, kvm_get_running_vcpus()); -out: - of_node_put(np); return err; }
From: Robert Richter rrichter@cavium.com
This patch adds code to set the mac address of the device as provided by acpi tables. This is similar to the implementation for devicetree in of_get_mac_address(). The table is searched for the device property entries "mac-address", "local-mac-address" and "address" in that order. The address is provided in a u64 variable and must contain a valid 6 bytes-len mac addr.
Based on a patch from Narinder Dhillon ndhillon@cavium.com.
Signed-off-by: Narinder Dhillon ndhillon@cavium.com Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 37 ++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c index 5e1bebd..a8b1fb1 100644 --- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c +++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c @@ -27,7 +27,7 @@ struct lmac { struct bgx *bgx; int dmac; - unsigned char mac[ETH_ALEN]; + u8 mac[ETH_ALEN]; bool link_up; int lmacid; /* ID within BGX */ int lmacid_bd; /* ID on board */ @@ -849,6 +849,39 @@ static int bgx_match_phy_id(struct device *dev, void *data) return 0; }
+static const char *addr_propnames[] = { + "mac-address", + "local-mac-address", + "address", +}; + +static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst) +{ + u64 mac; + int i; + int ret; + + for (i = 0; i < ARRAY_SIZE(addr_propnames); i++) { + ret = acpi_dev_prop_read_single(adev, addr_propnames[i], + DEV_PROP_U64, &mac); + if (ret) + continue; + + if (mac & (~0ULL << 48)) + continue; /* more than 6 bytes */ + + mac = cpu_to_be64(mac << 16); + if (!is_valid_ether_addr((u8 *)&mac)) + continue; + + ether_addr_copy(dst, (u8 *)&mac); + + return 0; + } + + return ret ? ret : -EINVAL; +} + static acpi_status bgx_acpi_register_phy(acpi_handle handle, u32 lvl, void *context, void **rv) { @@ -876,6 +909,8 @@ static acpi_status bgx_acpi_register_phy(acpi_handle handle, SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev); bgx->lmac[bgx->lmac_count].phydev = to_phy_device(phy_dev);
+ acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac); + bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count; bgx->lmac_count++;
From: Narinder ndhillon@caviumnetworks.com
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/topology.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index fcb8f7b..8ca4ab2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -258,12 +258,19 @@ void store_cpu_topology(unsigned int cpuid) cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) | MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; } else { +#ifdef CONFIG_ACPI + /* Multiprocessor system : Single-thread per core */ + cpuid_topo->thread_id = -1; + cpuid_topo->core_id = (((mpidr >> 8) & 0xff) * 16) + (mpidr & 0xff); + cpuid_topo->cluster_id = (cpuid_topo->core_id) > 47 ? 1:0; +#else /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1; cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) | MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; +#endif }
pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
From: Robert Richter rrichter@cavium.com
Fixing the following build error when building drivers as modules:
ERROR: "acpi_dev_prop_read_single" [drivers/net/phy/mdio-octeon.ko] undefined! ERROR: "acpi_dev_prop_read_single" [drivers/net/ethernet/cavium/thunder/thunder_bgx.ko] undefined!
Reported-by: Andreas Schwab schwab@suse.de Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/acpi/property.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c index 635785d..237e3c5 100644 --- a/drivers/acpi/property.c +++ b/drivers/acpi/property.c @@ -432,7 +432,7 @@ int acpi_dev_prop_read_single(struct acpi_device *adev, const char *propname, } return ret; } -EXPORT_SYMBOL_GPL(acpi_dev_prop_read); +EXPORT_SYMBOL_GPL(acpi_dev_prop_read_single);
static int acpi_copy_property_array_u8(const union acpi_object *items, u8 *val, size_t nval)
From: Robert Richter rrichter@cavium.com
Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/topology.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 8ca4ab2..3f14ae9 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -11,6 +11,7 @@ * for more details. */
+#include <linux/acpi.h> #include <linux/cpu.h> #include <linux/cpumask.h> #include <linux/init.h> @@ -257,20 +258,20 @@ void store_cpu_topology(unsigned int cpuid) cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) | MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; - } else { #ifdef CONFIG_ACPI + } else if (!acpi_disabled) { /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1; cpuid_topo->core_id = (((mpidr >> 8) & 0xff) * 16) + (mpidr & 0xff); cpuid_topo->cluster_id = (cpuid_topo->core_id) > 47 ? 1:0; -#else +#endif + } else { /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1; cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) | MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; -#endif }
pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n",
From: Tomasz Nowicki tn@semihalf.com
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 7642075..9b6e789 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -53,7 +53,8 @@ int pcibios_add_device(struct pci_dev *dev) if (pcibios_add_device_impl) return pcibios_add_device_impl(dev);
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0); + if (acpi_disabled) + dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
return 0; }
From: Tomasz Nowicki tn@semihalf.com
ACPI spec5.1 states that the value of _CCA is inherited by all descendants of bus master devices, root PCI bridge in this case. So this patch is checking if PCI device's root bridge has coherency flag set and then mounts DMA ops (in similar way as DT does).
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pci/pci-acpi.c | 13 +++++++++++++ drivers/pci/pci.c | 10 ++++++++++ drivers/pci/probe.c | 2 +- include/linux/pci-acpi.h | 2 ++ include/linux/pci.h | 2 ++ 5 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 314a625..9ba685c 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -291,6 +291,19 @@ int pci_get_hp_params(struct pci_dev *dev, struct hotplug_params *hpp) } EXPORT_SYMBOL_GPL(pci_get_hp_params);
+ +void pci_acpi_dma_configure(struct pci_dev *dev) +{ + struct device *bridge = pci_get_host_bridge_device(dev); + bool coherent; + + if (acpi_check_dma(ACPI_COMPANION(bridge), &coherent)) + arch_setup_dma_ops(&dev->dev, 0, 0, NULL, coherent); + + pci_put_host_bridge_device(bridge); +} +EXPORT_SYMBOL_GPL(pci_acpi_dma_configure); + /** * pci_acpi_wake_bus - Root bus wakeup notification fork function. * @work: Work item to handle. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0eec993..79c34d4 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -13,6 +13,7 @@ #include <linux/of.h> #include <linux/of_pci.h> #include <linux/pci.h> +#include <linux/pci-acpi.h> #include <linux/pm.h> #include <linux/slab.h> #include <linux/module.h> @@ -4539,6 +4540,15 @@ out: #endif #endif
+void pci_dma_configure(struct pci_dev *dev) +{ + if (acpi_disabled) + of_pci_dma_configure(dev); + else + pci_acpi_dma_configure(dev); +} +EXPORT_SYMBOL(pci_dma_configure); + /** * pci_ext_cfg_avail - can we access extended PCI config space? * diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 11ec2e7..9fad896 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1557,7 +1557,7 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) dev->dev.dma_mask = &dev->dma_mask; dev->dev.dma_parms = &dev->dma_parms; dev->dev.coherent_dma_mask = 0xffffffffull; - of_pci_dma_configure(dev); + pci_dma_configure(dev);
pci_set_dma_max_seg_size(dev, 65536); pci_set_dma_seg_boundary(dev, 0xffffffff); diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index a965efa..786d929 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -54,6 +54,7 @@ static inline acpi_handle acpi_pci_get_bridge_handle(struct pci_bus *pbus)
void acpi_pci_add_bus(struct pci_bus *bus); void acpi_pci_remove_bus(struct pci_bus *bus); +void pci_acpi_dma_configure(struct pci_dev *dev);
#ifdef CONFIG_ACPI_PCI_SLOT void acpi_pci_slot_init(void); @@ -85,6 +86,7 @@ extern const u8 pci_acpi_dsm_uuid[]; #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { } +static inline void pci_acpi_dma_configure(struct pci_dev *dev) { } #endif /* CONFIG_ACPI */
#ifdef CONFIG_ACPI_APEI diff --git a/include/linux/pci.h b/include/linux/pci.h index 7474225..3732cad 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1336,6 +1336,7 @@ static inline void pci_bus_assign_domain_nr(struct pci_bus *bus, typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode, unsigned int command_bits, u32 flags); void pci_register_set_vga_state(arch_set_vga_state_t func); +void pci_dma_configure(struct pci_dev *dev);
#else /* CONFIG_PCI is not enabled */
@@ -1439,6 +1440,7 @@ static inline struct pci_dev *pci_get_bus_and_slot(unsigned int bus, static inline int pci_domain_nr(struct pci_bus *bus) { return 0; } static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; } static inline int pci_get_new_domain_nr(void) { return -ENOSYS; } +static inline void pci_dma_configure(struct pci_dev *dev) { }
#define dev_is_pci(d) (false) #define dev_is_pf(d) (false)
From: Tomasz Nowicki tn@semihalf.com
IORT shows representation of IO topology that will be used by ARM based systems. It describes how various components are connected together e.g. which devices are connected to given ITS instance.
This patch implements calls which allow to: - register/remove ITS as MSI chip - parse all IORT nodes and form node tree (for easy lookup) - find ITS (MSI chip) that device is assigned to
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- arch/arm64/kernel/pci-acpi.c | 2 + drivers/acpi/Kconfig | 3 + drivers/acpi/Makefile | 1 + drivers/acpi/iort.c | 272 +++++++++++++++++++++++++++++++++++++++ drivers/irqchip/Kconfig | 1 + drivers/irqchip/irq-gic-v3-its.c | 12 +- include/linux/iort.h | 39 ++++++ 7 files changed, 326 insertions(+), 4 deletions(-) create mode 100644 drivers/acpi/iort.c create mode 100644 include/linux/iort.h
diff --git a/arch/arm64/kernel/pci-acpi.c b/arch/arm64/kernel/pci-acpi.c index 517d570..5bbfbfb 100644 --- a/arch/arm64/kernel/pci-acpi.c +++ b/arch/arm64/kernel/pci-acpi.c @@ -17,6 +17,7 @@ #include <linux/acpi.h> #include <linux/init.h> #include <linux/io.h> +#include <linux/iort.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/mmconfig.h> @@ -352,6 +353,7 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) __release_pci_root_info(info); return NULL; } + pbus->msi = iort_find_pci_msi_chip(domain, 0);
pci_set_host_bridge_release(to_pci_host_bridge(pbus->bridge), release_pci_root_info, info); diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 114cf48..bd9204b 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -57,6 +57,9 @@ config ACPI_SYSTEM_POWER_STATES_SUPPORT config ACPI_CCA_REQUIRED bool
+config IORT_TABLE + bool + config ACPI_SLEEP bool depends on SUSPEND || HIBERNATION diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index e32b8cd..d1d1e7a 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -79,6 +79,7 @@ obj-$(CONFIG_ACPI_HED) += hed.o obj-$(CONFIG_ACPI_EC_DEBUGFS) += ec_sys.o obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o obj-$(CONFIG_ACPI_BGRT) += bgrt.o +obj-$(CONFIG_IORT_TABLE) += iort.o
# processor has its own "processor." module_param namespace processor-y := processor_driver.o processor_throttling.o diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c new file mode 100644 index 0000000..72240c6 --- /dev/null +++ b/drivers/acpi/iort.c @@ -0,0 +1,272 @@ +/* + * Copyright (C) 2015, Linaro Ltd. + * Author: Tomasz Nowicki tomasz.nowicki@linaro.org + * Author: Hanjun Guo hanjun.guo@linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * This file implements early detection/parsing of I/O mapping + * reported to OS through firmware via I/O Remapping Table (IORT) + * IORT document number: ARM DEN 0049A + * + * These routines are used by ITS and PCI host bridge drivers. + */ + +#define pr_fmt(fmt) "ACPI: IORT: " fmt + +#include <linux/acpi.h> +#include <linux/export.h> +#include <linux/iort.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/msi.h> +#include <linux/mutex.h> +#include <linux/slab.h> + +struct iort_its_msi_chip { + struct list_head list; + struct msi_controller *chip; + u32 id; +}; + +typedef acpi_status (*iort_find_node_callback) + (struct acpi_iort_node *node, void *context); + +/* pointer to the mapped IORT table */ +static struct acpi_table_header *iort_table; +static LIST_HEAD(iort_pci_msi_chip_list); +static DEFINE_MUTEX(iort_pci_msi_chip_mutex); + +int iort_pci_msi_chip_add(struct msi_controller *chip, u32 its_id) +{ + struct iort_its_msi_chip *its_msi_chip; + + its_msi_chip = kzalloc(sizeof(*its_msi_chip), GFP_KERNEL); + if (!its_msi_chip) + return -ENOMEM; + + its_msi_chip->chip = chip; + its_msi_chip->id = its_id; + + mutex_lock(&iort_pci_msi_chip_mutex); + list_add(&its_msi_chip->list, &iort_pci_msi_chip_list); + mutex_unlock(&iort_pci_msi_chip_mutex); + + return 0; +} +EXPORT_SYMBOL_GPL(iort_pci_msi_chip_add); + +void iort_pci_msi_chip_remove(struct msi_controller *chip) +{ + struct iort_its_msi_chip *its_msi_chip, *t; + + mutex_lock(&iort_pci_msi_chip_mutex); + list_for_each_entry_safe(its_msi_chip, t, &iort_pci_msi_chip_list, list) { + if (its_msi_chip->chip == chip) { + list_del(&its_msi_chip->list); + kfree(its_msi_chip); + break; + } + } + mutex_unlock(&iort_pci_msi_chip_mutex); +} +EXPORT_SYMBOL_GPL(iort_pci_msi_chip_remove); + +static struct msi_controller *iort_pci_find_msi_chip_by_its(u32 its_id) +{ + struct iort_its_msi_chip *its_msi_chip; + + mutex_lock(&iort_pci_msi_chip_mutex); + list_for_each_entry(its_msi_chip, &iort_pci_msi_chip_list, list) { + if (its_msi_chip->id == its_id) { + mutex_unlock(&iort_pci_msi_chip_mutex); + return its_msi_chip->chip; + } + } + mutex_unlock(&iort_pci_msi_chip_mutex); + + return NULL; +} + +/** + * iort_find_root_its_node() - Get PCI root complex, device, or SMMU's + * parent ITS node. + * @node: node pointer to PCI root complex, device, or SMMU + * + * Returns: parent ITS node pointer on success + * NULL on failure + */ +static struct acpi_iort_node * +iort_find_parent_its_node(struct acpi_iort_node *node) +{ + struct acpi_iort_id_mapping *id_map; + + if (!node) + return NULL; + + /* Go upstream until find its parent ITS node */ + while (node->type != ACPI_IORT_NODE_ITS_GROUP) { + /* TODO: handle multi ID mapping entries */ + id_map = ACPI_ADD_PTR(struct acpi_iort_id_mapping, node, + node->mapping_offset); + + /* Firmware bug! */ + if (!id_map->output_reference) { + pr_err(FW_BUG "[node %p type %d] ID map has invalid parent reference\n", + node, node->type); + return NULL; + } + + /* TODO: Components that do not generate MSIs but are connected to an SMMU */ + node = ACPI_ADD_PTR(struct acpi_iort_node, iort_table, + id_map->output_reference); + } + + return node; +} + +/** + * iort_scan_node() - scan the IORT and call the handler with specific + * IORT node type + * @type: IORT node type + * @callback: callback with specific node type + * @context: context pass to the callback + * + * Returns: node pointer when the callback succeed + * NULL on failure + */ +static struct acpi_iort_node * +iort_scan_node(enum acpi_iort_node_type type, + iort_find_node_callback callback, void *context) +{ + struct acpi_iort_node *iort_node, *iort_end; + struct acpi_table_iort *iort; + int i; + + if (!iort_table) + return NULL; + + /* + * iort_table and iort both point to the start of IORT table, but + * have different struct types + */ + iort = container_of(iort_table, struct acpi_table_iort, header); + + /* Get the first iort node */ + iort_node = ACPI_ADD_PTR(struct acpi_iort_node, iort, + iort->node_offset); + + /* pointer to the end of the table */ + iort_end = ACPI_ADD_PTR(struct acpi_iort_node, iort_table, + iort_table->length); + + for (i = 0; i < iort->node_count; i++) { + if (iort_node >= iort_end) { + pr_err("iort node pointer overflows, bad table\n"); + return NULL; + } + + if (iort_node->type == type) { + if (ACPI_SUCCESS(callback(iort_node, context))) + return iort_node; + } + + iort_node = ACPI_ADD_PTR(struct acpi_iort_node, iort_node, + iort_node->length); + } + + return NULL; +} + +static acpi_status +iort_find_pci_rc_callback(struct acpi_iort_node *node, void *context) +{ + struct acpi_iort_root_complex *pci_rc; + int segment = *(int *)context; + + pci_rc = (struct acpi_iort_root_complex *)node->node_data; + + /* + * It is assumed that PCI segment numbers have a one-to-one mapping + * with root complexes. Each segment number can represent only one + * root complex. + */ + if (pci_rc->pci_segment_number == segment) + return AE_OK; + + return AE_NOT_FOUND; +} + +/** + * iort_find_pci_msi_chip() - find the msi controller with root complex's + * segment number + * @segment: domain number of this pci root complex + * @idx: index of the ITS in the ITS group + * + * Returns: msi controller bind to the root complex with the segment + * NULL on failure + */ +struct msi_controller *iort_find_pci_msi_chip(int segment, unsigned int idx) +{ + struct acpi_iort_its_group *its; + struct acpi_iort_node *node; + struct msi_controller *msi_chip; + + node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX, + iort_find_pci_rc_callback, &segment); + if (!node) { + pr_err("can't find node related to PCI host bridge [segment %d]\n", + segment); + return NULL; + } + + node = iort_find_parent_its_node(node); + if (!node) { + pr_err("can't find ITS parent node for PCI host bridge [segment %d]\n", + segment); + return NULL; + } + + /* Move to ITS specific data */ + its = (struct acpi_iort_its_group *)node->node_data; + if (idx > its->its_count) { + pr_err("requested ITS ID index [%d] is greater than available ITS count [%d]\n", + idx, its->its_count); + return NULL; + } + + msi_chip = iort_pci_find_msi_chip_by_its(its->identifiers[idx]); + if (!msi_chip) + pr_err("can not find ITS chip ID:%d, not registered\n", + its->identifiers[idx]); + + return msi_chip; +} +EXPORT_SYMBOL_GPL(iort_find_pci_msi_chip); + +/* Get the remapped IORT table */ +static int __init iort_table_detect(void) +{ + acpi_status status; + + if (acpi_disabled) + return -ENODEV; + + status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); + if (ACPI_FAILURE(status)) { + const char *msg = acpi_format_exception(status); + pr_err("Failed to get table, %s\n", msg); + return -EINVAL; + } + + return 0; +} +arch_initcall(iort_table_detect); diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index 120d815..9abf682 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -26,6 +26,7 @@ config ARM_GIC_V3 config ARM_GIC_V3_ITS bool select PCI_MSI_IRQ_DOMAIN + select IORT_TABLE if (ACPI && PCI_MSI)
config ARM_NVIC bool diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 4814954..3d47eb0 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -19,6 +19,7 @@ #include <linux/cpu.h> #include <linux/delay.h> #include <linux/interrupt.h> +#include <linux/iort.h> #include <linux/log2.h> #include <linux/mm.h> #include <linux/msi.h> @@ -1658,15 +1659,18 @@ static int __init gic_acpi_parse_madt_its(struct acpi_subtable_header *header, const unsigned long end) { - struct acpi_madt_generic_translator *its; + struct acpi_madt_generic_translator *its_table; + struct its_node *its;
if (BAD_MADT_ENTRY(header, end)) return -EINVAL;
- its = (struct acpi_madt_generic_translator *)header; + its_table = (struct acpi_madt_generic_translator *)header;
- pr_info("ITS: ID: 0x%x\n", its->translation_id); - its_probe(its->base_address, 2 * SZ_64K); + pr_info("ITS: ID: 0x%x\n", its_table->translation_id); + its = its_probe(its_table->base_address, 2 * SZ_64K); + if (!its_init_domain(NULL, its)) + iort_pci_msi_chip_add(&its->msi_chip, its_table->translation_id); return 0; }
diff --git a/include/linux/iort.h b/include/linux/iort.h new file mode 100644 index 0000000..7b83a03 --- /dev/null +++ b/include/linux/iort.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2015, Linaro Ltd. + * Author: Tomasz Nowicki tomasz.nowicki@linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ + +#ifndef __IORT_H__ +#define __IORT_H__ + +struct msi_controller; + +#ifdef CONFIG_IORT_TABLE +int iort_pci_msi_chip_add(struct msi_controller *chip, u32 its_id); +void iort_pci_msi_chip_remove(struct msi_controller *chip); +struct msi_controller *iort_find_pci_msi_chip(int segment, unsigned int idx); +#else +static inline int +iort_pci_msi_chip_add(struct msi_controller *chip, u32 its_id) { return -ENODEV; } + +static inline void +iort_pci_msi_chip_remove(struct msi_controller *chip) { } + +static struct msi_controller * +iort_find_pci_msi_chip(int segment, unsigned int idx) { return NULL; } +#endif + +#endif /* __IORT_H__ */
Fix build issue in case CONFIG_ACPI is not set.
Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/pci/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 79c34d4..a73dd56 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4490,10 +4490,12 @@ void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) static int use_dt_domains = -1; int domain;
+#ifdef CONFIG_ACPI if (!acpi_disabled) { domain = PCI_CONTROLLER(bus)->segment; goto out; } +#endif
domain = of_get_pci_domain_nr(parent->of_node);
From: Tomasz Nowicki tomasz.nowicki@linaro.org
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Signed-off-by: Robert Richter rrichter@cavium.com Signed-off-by: Vadim Lomovtsev Vadim.Lomovtsev@caviumnetworks.com --- drivers/tty/n_tty.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index ee8bfac..8ad43b5 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1711,6 +1711,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, { struct n_tty_data *ldata = tty->disc_data; int room, n, rcvd = 0, overflow; + size_t *read_tail_tmp = &ldata->read_tail;
down_read(&tty->termios_rwsem);
@@ -1728,7 +1729,7 @@ n_tty_receive_buf_common(struct tty_struct *tty, const unsigned char *cp, * the consumer has loaded the data in read_buf up to the new * read_tail (so this producer will not overwrite unread data) */ - size_t tail = smp_load_acquire(&ldata->read_tail); + size_t tail = smp_load_acquire(read_tail_tmp);
room = N_TTY_BUF_SIZE - (ldata->read_head - tail); if (I_PARMRK(tty))
It's taken a bit of time, but we now have the 4.2rc2 kernel available in git. If you could use this git repo as the base for these patches it would make things immensely easier.
https://git.centos.org/summary/sig-altarch!kernel.git
The basics for how to work with the kernel source repository are here -> http://wiki.centos.org/Sources
On 08/13/2015 08:17 AM, Vadim Lomovtsev wrote:
This patch set implements Cavium ThunderX SoC support. Patch set rebased on to 4.2-rc5 and overrides patch set v0 so please drop them before applying this one.
In case of any issues at build or run-time, please let me know.
Al Stone (2): Fix arm64 compilation error in PNP code clocksource: arm_arch_timer: fix system hang
Andre Przywara (14): KVM: arm/arm64: VGIC: don't track used LRs in the distributor KVM: arm/arm64: add emulation model specific destroy function KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities KVM: arm/arm64: make GIC frame address initialization model specific KVM: arm64: Introduce new MMIO region for the ITS base address KVM: arm64: handle ITS related GICv3 redistributor registers KVM: arm64: introduce ITS emulation file with stub functions KVM: arm64: implement basic ITS register handlers KVM: arm64: add data structures to model ITS interrupt translation KVM: arm64: handle pending bit for LPIs in ITS emulation KVM: arm64: sync LPI configuration and pending tables KVM: arm64: implement ITS command queue command handlers KVM: arm64: implement MSI injection in ITS emulation KVM: arm64: enable ITS emulation as a virtual MSI controller
Andrew Pinski (5): ARM64:VDSO: Improve gettimeofday, don't use udiv ARM64:VDSO: Improve __do_get_tspec, don't use udiv ARM64:Improve ticked spinlocks for high core count. ARM64:spinlocks: Fix up for WFE and improve performance slightly. ARM64: Improve copy_page for 128 cache line sizes.
Craig Magina (1): arm64: optimized copy_to_user and copy_from_user assembly code, part 2
David Daney (5): pci: Add is_pcierc element to struct pci_bus gic-its: Allow pci_requester_id to be overridden. arm64, pci: Allow RC drivers to supply pcibios_add_device() implementation. irqchip: gic-v3: Add gic_get_irq_domain() to get the irqdomain of the GIC. net/mlx4: Remove improper usage of dma_alloc_coherent().
Eric Auger (7): KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry KVM: irqchip: convey devid to kvm_set_msi KVM: arm/arm64: enable irqchip routing KVM: arm/arm64: build a default routing table KVM: arm/arm64: enable MSI routing KVM: arm: implement kvm_set_msi by gsi direct mapping
Feng Kan (1): arm64: optimized copy_to_user and copy_from_user assembly code
Graeme Gregory (3): Juno / net: smsc911x add support for probing from ACPI net: smc91x: add ACPI probing support. virtio-mmio: add ACPI probing
Naresh Bhat (1): mfd: vexpress-sysreg Add ACPI support for probing to driver
Narinder (1): Fixes to get ACPI based kernel booting. Temporary fix to get going.
Radha Mohan Chintakuntla (4): net: mdio-octeon: Modify driver to work on both ThunderX and Octeon net: mdio-octeon: Fix octeon_mdiobus_probe function for return values net: thunderx: Select CONFIG_MDIO_OCTEON for ThunderX NIC arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
Robert Richter (12): Revert "acpi, thuderx, pci: Add MCFG fixup." net: thunderx: Fixes for nicvf_set_rxfh() net: cavium: thunder_bgx/nic: Factor out DT specific code irqchip, gicv3-its: Read typer register outside the loop irqchip, gicv3: Add HW revision detection and configuration irqchip, gicv3: Implement Cavium ThunderX erratum 23154 irqchip, gicv3-its: Implement Cavium ThunderX errata 22375, 24313 arm64: gicv3: its: Add range check for number of allocated pages Revert "mfd: vexpress: Remove non-DT code" net: thunderx: acpi: Get mac address from acpi table acpi, property: Fix EXPORT_SYMBOL_GPL() for acpi_dev_prop_read_single() arm64: topology: Use acpi_disabled for ACPI check
Sunil Goutham (2): net: thunderx: Receive hashing HW offload support net: thunderx: Add receive error stats reporting via ethtool
TIRUMALESH CHALAMARLA (1): arm64: Increase the max granular size
Tirumalesh Chalamarla (3): PCI_ Add host drivers for Cavium ThunderX processors arm64: KVM: Enable minimalistic support for Thunder KVM: extend struct kvm_msi to hold a 32-bit device ID
Tomasz Nowicki (24): arm64, acpi: Implement new "GIC version" field of MADT GIC entry. ACPI, GICv3: Allow to map irq for non-hierarchical doamin. GICv3: Refactor gic_of_init() of GICv3 driver to allow for FDT and ACPI initialization. ACPI, GICV3+: Add support for GICv3+ initialization. GICv3, ITS: Isolate FDT related code, extract common functions. ACPI, GICv3, ITS: Add support for ACPI ITS binding. x86, acpi, pci: Reorder logic of pci_mmconfig_insert() function x86, acpi, pci: Move arch-agnostic MMCFG code out of arch/x86/ directory x86, acpi, pci: Move PCI config space accessors. x86, acpi, pci: mmconfig_{32,64}.c code refactoring - remove code duplication. x86, acpi, pci: mmconfig_64.c becomes default implementation for arch agnostic low-level direct PCI config space accessors via MMCONFIG. pci, acpi: Share ACPI PCI config space accessors. arm64, pci, acpi: Let ARM64 to use MMCONFIG PCI config space accessors. arm64, pci: Add PCI ACPI probing for ARM64 net, phy, apci: Allow to initialize Marvell phy in the ACPI way. net, mdio, acpi: Add support for ACPI binding. net, thunder, bgx: Rework driver to support ACPI binding. arm64/acpi/pci: provide hook for MCFG fixups acpi, property: Export acpi_dev_prop_read call to be usable for kernel modules. ARM64 / ACPI: Point KVM to the virtual timer interrupt when booting with ACPI arm64, acpi, pci: Omit OF related IRQ parsing when running with ACPI kernel. pci, acpi, dma: Unify coherency checking logic for PCI devices. ARM64, ACPI, PCI, MSI: I/O Remapping Table (IORT) initial support. Compiler bug workaround!!!
Vadim Lomovtsev (1): PCI: ThunderX: fix build issue
Documentation/virtual/kvm/api.txt | 46 +- Documentation/virtual/kvm/devices/arm-vgic.txt | 9 + arch/arm/include/asm/kvm_host.h | 4 +- arch/arm/kvm/Kconfig | 3 + arch/arm/kvm/Makefile | 2 +- arch/arm/kvm/arm.c | 2 +- arch/arm64/Kconfig | 4 + arch/arm64/include/asm/acpi.h | 2 + arch/arm64/include/asm/cache.h | 2 +- arch/arm64/include/asm/cputype.h | 3 + arch/arm64/include/asm/kvm_host.h | 3 +- arch/arm64/include/asm/pci.h | 47 + arch/arm64/include/asm/spinlock.h | 36 +- arch/arm64/include/uapi/asm/kvm.h | 5 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/acpi.c | 33 +- arch/arm64/kernel/pci-acpi.c | 362 +++++++ arch/arm64/kernel/pci.c | 35 +- arch/arm64/kernel/topology.c | 8 + arch/arm64/kernel/vdso/gettimeofday.S | 47 +- arch/arm64/kvm/Kconfig | 3 + arch/arm64/kvm/Makefile | 3 +- arch/arm64/kvm/guest.c | 6 + arch/arm64/kvm/reset.c | 8 +- arch/arm64/kvm/sys_regs_generic_v8.c | 2 + arch/arm64/lib/copy_from_user.S | 87 +- arch/arm64/lib/copy_page.S | 32 + arch/arm64/lib/copy_template.S | 212 ++++ arch/arm64/lib/copy_to_user.S | 57 +- arch/x86/include/asm/pci.h | 42 + arch/x86/include/asm/pci_x86.h | 72 -- arch/x86/pci/Makefile | 5 +- arch/x86/pci/acpi.c | 1 + arch/x86/pci/init.c | 1 + arch/x86/pci/mmconfig-shared.c | 242 +---- arch/x86/pci/mmconfig_32.c | 11 +- arch/x86/pci/mmconfig_64.c | 153 --- drivers/acpi/Kconfig | 3 + drivers/acpi/Makefile | 2 + drivers/acpi/bus.c | 1 + drivers/acpi/iort.c | 272 +++++ drivers/acpi/mmconfig.c | 437 ++++++++ drivers/acpi/property.c | 1 + drivers/clocksource/arm_arch_timer.c | 9 +- drivers/infiniband/hw/mlx4/cq.c | 2 +- drivers/infiniband/hw/mlx4/qp.c | 2 +- drivers/infiniband/hw/mlx4/srq.c | 3 +- drivers/irqchip/Kconfig | 1 + drivers/irqchip/irq-gic-common.c | 11 + drivers/irqchip/irq-gic-common.h | 9 + drivers/irqchip/irq-gic-v3-its.c | 241 +++-- drivers/irqchip/irq-gic-v3.c | 373 ++++++- drivers/mfd/vexpress-sysreg.c | 133 ++- drivers/net/ethernet/cavium/Kconfig | 2 + drivers/net/ethernet/cavium/thunder/nic.h | 36 +- .../net/ethernet/cavium/thunder/nicvf_ethtool.c | 50 +- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 62 +- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 86 +- drivers/net/ethernet/cavium/thunder/nicvf_queues.h | 41 - drivers/net/ethernet/cavium/thunder/thunder_bgx.c | 175 ++- drivers/net/ethernet/mellanox/mlx4/alloc.c | 104 +- drivers/net/ethernet/mellanox/mlx4/en_cq.c | 9 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 +- drivers/net/ethernet/mellanox/mlx4/en_resources.c | 32 - drivers/net/ethernet/mellanox/mlx4/en_rx.c | 11 +- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 14 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 - drivers/net/ethernet/mellanox/mlx4/mr.c | 5 +- drivers/net/ethernet/smsc/smc91x.c | 11 +- drivers/net/ethernet/smsc/smsc911x.c | 38 + drivers/net/phy/Kconfig | 9 +- drivers/net/phy/marvell.c | 118 +- drivers/net/phy/mdio-octeon.c | 198 +++- drivers/pci/host/Kconfig | 12 + drivers/pci/host/Makefile | 2 + drivers/pci/host/pcie-thunder-pem.c | 462 ++++++++ drivers/pci/host/pcie-thunder.c | 335 ++++++ drivers/pci/pci-acpi.c | 13 + drivers/pci/pci.c | 98 +- drivers/pci/probe.c | 4 +- drivers/pnp/resource.c | 2 + drivers/tty/n_tty.c | 3 +- drivers/virtio/virtio_mmio.c | 12 +- include/acpi/actbl1.h | 12 +- include/asm-generic/vmlinux.lds.h | 7 + include/kvm/arm_vgic.h | 39 +- include/linux/iort.h | 39 + include/linux/irqchip/arm-gic-acpi.h | 3 + include/linux/irqchip/arm-gic-v3.h | 22 +- include/linux/kvm_host.h | 7 +- include/linux/mlx4/device.h | 11 +- include/linux/mmconfig.h | 86 ++ include/linux/pci-acpi.h | 2 + include/linux/pci.h | 11 +- include/uapi/linux/kvm.h | 11 +- virt/kvm/arm/arch_timer.c | 76 +- virt/kvm/arm/its-emul.c | 1141 ++++++++++++++++++++ virt/kvm/arm/its-emul.h | 55 + virt/kvm/arm/vgic-v2-emul.c | 15 + virt/kvm/arm/vgic-v2.c | 1 + virt/kvm/arm/vgic-v3-emul.c | 105 +- virt/kvm/arm/vgic-v3.c | 1 + virt/kvm/arm/vgic.c | 375 +++++-- virt/kvm/arm/vgic.h | 5 + virt/kvm/eventfd.c | 6 +- virt/kvm/irqchip.c | 12 +- 106 files changed, 5824 insertions(+), 1257 deletions(-) create mode 100644 arch/arm64/kernel/pci-acpi.c create mode 100644 arch/arm64/lib/copy_template.S delete mode 100644 arch/x86/pci/mmconfig_64.c create mode 100644 drivers/acpi/iort.c create mode 100644 drivers/acpi/mmconfig.c create mode 100644 drivers/pci/host/pcie-thunder-pem.c create mode 100644 drivers/pci/host/pcie-thunder.c create mode 100644 include/linux/iort.h create mode 100644 include/linux/mmconfig.h create mode 100644 virt/kvm/arm/its-emul.c create mode 100644 virt/kvm/arm/its-emul.h