Created attachment 186605 [details] linuxkpi_compat.patch This patch allows LinuxKPI based drivers to respond to 32-bit compatibility ioctls. Related to: https://github.com/FreeBSDDesktop/freebsd-base-graphics/issues/166
A commit references this bug: Author: hselasky Date: Fri Sep 22 08:12:09 UTC 2017 New revision: 323910 URL: https://svnweb.freebsd.org/changeset/base/323910 Log: Add support for 32-bit compatibility IOCTLs in the LinuxKPI. Bump the FreeBSD version to force recompilation of external kernel modules due to structure change. PR: 222504 Submitted by: Greg V <greg@unrelenting.technology> MFC after: 1 week Sponsored by: Mellanox Technologies Changes: head/sys/compat/linuxkpi/common/include/linux/fs.h head/sys/compat/linuxkpi/common/src/linux_compat.c head/sys/sys/param.h
Submitted with some modifications.
A commit references this bug: Author: hselasky Date: Thu Feb 1 13:01:45 UTC 2018 New revision: 328653 URL: https://svnweb.freebsd.org/changeset/base/328653 Log: MFC r310014-r327788: This is an overwrite merge backport of the LinuxKPI from FreeBSD-head. Following is a complete list of MFC'ed revisions and also partially MFC'ed revisions in the end. The MFC'ed revision are listed in incremental order. Bump the __FreeBSD_version to force recompilation of any external kernel modules. Sponsored by: Mellanox Technologies MFC r310014: Remove the only user of sysctl_add_oid(). My plan is to change this function's prototype at some point in the future to add a new label argument, which can be used to export all of sysctl as metrics that can be scraped by Prometheus. Switch over this caller to use the macro wrapper counterpart. MFC r310031: linuxkpi: Fix not-found case of linux_pci_find_irq_dev Linux list_for_each_entry() does not neccessarily end with the iterator NULL (it may be an offset from NULL if the list member is not the first element of the member struct). Reported by: Coverity CID: 1366940 Reviewed by: hselasky@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8780 MFC r313806: Whitespace fix. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r313807: Allow passing a constant atomic_t to atomic_read(). Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r313808: Implement more LinuxKPI atomic functions and macros. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r313810: Allow container_of() to be used with constant data pointers. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r313872: Implement GFP_DMA32 flag in the LinuxKPI. Define all FreeBSD native GFP bits as GFP_NATIVE_MASK. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314040: Make the LinuxKPI task struct persistent accross system calls. A set of helper functions have been added to manage the life of the LinuxKPI task struct. When an external system call or task is invoked, a check is made to create the task struct by demand. A thread destructor callback is registered to free the task struct when a thread exits to avoid memory leaks. This change lays the ground for emulating the Linux kernel more closely which is a dependency by the code using the LinuxKPI APIs. Add new dedicated td_lkpi_task field has been added to struct thread instead of abusing td_retval[1]. Fix some header file inclusions to make LINT kernel build properly after this change. Bump the __FreeBSD_version to force a rebuild of all kernel modules. Sponsored by: Mellanox Technologies MFC r314043: Add support for LinuxKPI tasklets. Tasklets are implemented using a taskqueue and a small statemachine on top. The additional statemachine is required to ensure all LinuxKPI tasklets get serialized. FreeBSD taskqueues do not guarantee serialisation of its tasks, except when there is only one worker thread configured. Sponsored by: Mellanox Technologies MFC r314044: Streamline the LinuxKPI spinlock wrappers. 1) Add better spinlock debug names when WITNESS_ALL is defined. 2) Make sure that the calling thread gets bound to the current CPU while a spinlock is locked. Some Linux kernel code depends on that the CPU ID doesn't change while a spinlock is locked. 3) Add support for using LinuxKPI spinlocks during a panic(). Sponsored by: Mellanox Technologies MFC r314050: Replace dummy implementation of RCU in the LinuxKPI with one based on the in-kernel concurrency kit's ck_epoch API. Factor RCU hlist_xxx() functions into own rculist.h header file. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314105: Improve LinuxKPI scatter list support. The i915kms driver in Linux 4.9 reimplement parts of the scatter list functions with regards to performance. In other words there is not so much room for changing structure layouts and functionality if the i915kms should be built AS-IS. This patch aligns the scatter list support to what is expected by the i915kms driver. Remove some comments not needed while at it. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314106: Optimise unmapped LinuxKPI page allocations. When allocating unmapped pages, take advantage of the direct map on AMD64 to get the virtual address corresponding to a page. Else all pages allocated must be mapped because sometimes the virtual address of a page is requested. Move all page allocation and deallocation code into an own C-file. Add support for GFP_DMA32, GFP_KERNEL, GFP_ATOMIC and __GFP_ZERO allocation flags. Make a clear separation between mapped and unmapped allocations. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314109: Convert magic values into macros in the LinuxKPI scatterlist implementation. Suggested by: cem @ Sponsored by: Mellanox Technologies MFC r314136: Implement __test_and_clear_bit() and __test_and_set_bit() in the LinuxKPI. The clang compiler will optimise these functions down to three AMD64 instructions if the bit argument is a constant during compilation. Sponsored by: Mellanox Technologies MFC r314205: Implement BIT_ULL() macro in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314207: Implement srcu_dereference() macro in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314214: Prototype device structure to ensure LinuxKPI header file can be included standalone. Sponsored by: Mellanox Technologies MFC r314215: Implement more string functions in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314336: Define __sum16 type in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314337: Implement more bit operation functions in the LinuxKPI. Some minor whitespace nits while at it. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314604: Update the LinuxKPI RCU and SRCU wrappers for the concurrency kit, CK. - Optimise the RCU implementation to not allocate and free ck_epoch_records during runtime. Instead allocate two sets of ck_epoch_records per CPU for general purpose use. The first set is only used for reader locks and the second set is only used for synchronization and barriers and is protected with a regular mutex to prevent simultaneous issues. - Move the task structure away from the rcu_head structure and into the per-CPU structures. This allows the size of the rcu_head structure to be reduced down to the size of two pointers. - Fix a bug where the linux_rcu_barrier() function only waited for one per-CPU epoch record to be completed instead of all. - Use a critical section or a mutex to protect ck_epoch_begin() and ck_epoch_end() depending on RCU or SRCU type. All the ck_epoch_xxx() functions, except ck_epoch_register(), ck_epoch_unregister() and ck_epoch_recycle() are not re-entrant and needs a critical section or a mutex to operate in the LinuxKPI, after inspecting the CK implementation of the above mentioned functions. The simultaneous issues arise from per-CPU epoch records being shared between multiple threads depending on the amount of taskswitching and how many threads are involved with the RCU and SRCU operations. - Properly free all epoch records by using safe list traversal at LinuxKPI module unload. It turns out the ck_epoch_recycle() always have the records on an internal list and use a flag in the epoch record to track allocated and free entries. This would lead to use after free during module unload. - Remove redundant synchronize_rcu() call from the linux_compat_uninit() function. Let the linux_rcu_runtime_uninit() function do the final rcu_barrier() instead. Sponsored by: Mellanox Technologies MFC r314675: Remove duplicate prototype in the LinuxKPI to fix compilation warning. Reported by: emaste @ Sponsored by: Mellanox Technologies MFC r314771: Give LinuxKPI Read-Write semaphores better debug names when WITNESS_ALL is defined. The lock name is based on the filename and line number where the initialisation happens. Sponsored by: Mellanox Technologies MFC r314772: Implement DECLARE_RWSEM() macro in the LinuxKPI to initialize a Read-Write semaphore during module init time. Sponsored by: Mellanox Technologies MFC r314774: Implement add_timer_on() function in the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r314843: LinuxKPI workqueue cleanup. This change makes the workqueue implementation behave more like in Linux, both functionality wise and structure wise. All workqueue code has been moved to linux_work.c Add an atomic based statemachine to the work_struct to ensure proper operation. Prior to this change struct_work was directly mapped to a FreeBSD task. When a taskqueue has multiple threads the same task may end up being executed on more than one worker thread simultaneously. This might cause problems with code coming from Linux, which expects serial behaviour, similar to Linux tasklets. Move all global workqueue function names into the linux_xxx domain to avoid symbol name clashes in the future. Implement a few more workqueue related functions and macros. Create two multithreaded taskqueues for the LinuxKPI during module load, one for time-consuming callbacks and one for non-time consuming callbacks. Sponsored by: Mellanox Technologies MFC r314853: Use grouptaskqueue for tasklets in the LinuxKPI. This avoids creating own per-CPU threads and also ensures the tasklet execution happens on the same CPU core invoking the tasklet. Sponsored by: Mellanox Technologies MFC r314859: Make sure jiffies value is cast to an integer in the LinuxKPI before doing millisecond conversion. Under FreeBSD jiffies are 32-bit. Sponsored by: Mellanox Technologies MFC r314861: Implement time_is_after_eq_jiffies() function in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314904: Implement eth_zero_addr() in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r314905: Cleanup the LinuxKPI slab implementation. Put large functions into linux_slab.c instead of declaring them static inline. Add support for more memory allocation wrappers like kmalloc_array() and __vmalloc(). Make sure either the M_WAITOK or the M_NOWAIT flag is set and mask away unused memory allocation flags before calling FreeBSD's malloc() routine. Move kmalloc_node() definition to slab.h where it belongs. Implement support for the SLAB_DESTROY_BY_RCU feature when creating a kmem_cache which basically means kmem_cache memory is freed using call_rcu(). Sponsored by: Mellanox Technologies MFC r314920: Fix compilation warning for powerpc64 by not using const keyword in return types: Type qualifiers ignored on function return type [-Wreturn-type] Reported by: andreast @ Sponsored by: Mellanox Technologies MFC r314953: Don't create any threads before SI_SUB_INIT_IF in the LinuxKPI. Else kthread_add() will assert it is called too soon. This fixes a startup issue when COMPAT_LINUXKPI is in enabled the kernel configuration file. Reported by: Michael Butler <imb@protected-networks.net> Sponsored by: Mellanox Technologies MFC r314965: Cleanup the LinuxKPI mutex wrappers. Add support for using mutexes during KDB and shutdown. This is also required for doing mode-switching during panic for drm-next. Add new mutex functions mutex_init_witness() and mutex_destroy() allowing LinuxKPI mutexes to be tracked by witness. Declare mutex_is_locked() and mutex_is_owned() like inline functions to get cleaner warnings. These functions are used inside WARN_ON() statements which might look a bit odd if these functions get fully expanded. Give mutexes better debug names through the mutex_name() macro when WITNESS_ALL is defined. The mutex_name() macro can prefix parts of the filename and line number before the mutex name. Sponsored by: Mellanox Technologies MFC r314970: Implement support for mutexes with deadlock avoidance in the LinuxKPI. When locking a mutex and deadlock is detected the first mutex lock call that sees the deadlock will return -EDEADLK . Sponsored by: Mellanox Technologies MFC r314971: Fix implementation of the DECLARE_WORK() macro in the LinuxKPI to fully initialize the declared work structure and not only the function callback pointer. Sponsored by: Mellanox Technologies MFC r315244: Set "current" pointer for LinuxKPI interrupts and timer callbacks. Sponsored by: Mellanox Technologies MFC r315410: Define some more LinuxKPI task related macros. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r315419: Implement more userspace memory access functions in the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r315420: The LinuxKPI pagefault disable and enable functions can only be used pairwise to support the FreeBSD way of pushing and popping the page fault flags. Ensure this by requiring every occurrence of pagefault disable function call to have a corresponding pagefault enable call. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r315422: Use __LP64__ to detect presence of suword64() to fix linking and loading of the LinuxKPI on 32-bit platforms. Reported by: lwhsu @ Sponsored by: Mellanox Technologies MFC r315442: Add comment describing the use of pagefault_disable() and pagefault_enable() in the LinuxKPI. Suggested by: rpokala@ Sponsored by: Mellanox Technologies MFC r315443: Implement minimalistic memory mapping structure, struct mm_struct, and some associated helper functions in the LinuxKPI. Let the existing linux_alloc_current() function allocate and initialize the new structure and let linux_free_current() drop the refcount on the memory mapping structure. When the mm_struct's refcount reaches zero, the structure is freed. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r315457: Implement get_pid_task(), pid_task() and some other PID helper functions in the LinuxKPI. Add a usage atomic to the task_struct structure to facilitate refcounting the task structure when returned from get_pid_task(). The get_task_struct() and put_task_struct() function is used to manage atomic refcounting. After this change the task_struct should only be freed through put_task_struct(). Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r315713: Add support for more IPv4 and IPv6 related macros in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r315714: Add full VNET support to the inet_get_local_port_range() function in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r315719: Extend cmpxchg() to support 8- and 16-bit values, and add xchg(). These are needed to support updated revisions of the DRM code. Reviewed by: hselasky (previous version) MFC r315856: Add support for ratelimited printouts in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r315859: Function macros are preferred in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r315863: Add proper error checking for the string to number conversion functions in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r315864: Use ppsratecheck() for ratelimiting in the LinuxKPI. Suggested by: cem @ Sponsored by: Mellanox Technologies MFC r316033: Implement a series of physical page management related functions in the LinuxKPI for accessing user-space memory in the kernel. Add functions to hold and wire physical page(s) based on a given range of user-space virtual addresses. Add functions to get and put a reference on, wire, hold, mark accessed, copy and dirty a physical page. Add new VM related structures and defines as a preparation step for advancing the memory map capabilities of the LinuxKPI. Add function to figure out if a virtual address was allocated using malloc(). Add function to convert a virtual kernel address into its physical page pointer. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r316034: Add more platforms supporting the direct map feature in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r316035: Implement vmalloc_32() in the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r316521: Implement down_write_killable() in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r316522: Unify error handling when si_drv1 is NULL in the LinuxKPI. Make sure the character device poll callback function does not return an error code, but a POLLXXX value, in case of failure. Sponsored by: Mellanox Technologies MFC r316561: Before registering a new mm_struct in the LinuxKPI check if other tasks in the belonging procedure already have a valid mm_struct and reference that instead. The mm_struct in the LinuxKPI should be shared among all tasks belonging to the same procedure. This has to do with with the mmap_sem semaphore which should serialize all VM operations inside a given procedure. Linux based drivers depend on this behaviour. Sponsored by: Mellanox Technologies MFC r316562: Implement proper support for memory map operations in the LinuxKPI, like open, close and fault using the character device pager. Some notes about the implementation: 1) Linux drivers set the vm_ops and vm_private_data fields during a mmap() call to indicate that the driver wants to use the LinuxKPI VM operations. Else these operations are not used. 2) The vm_private_data pointer is associated with a VM area structure and inserted into an internal LinuxKPI list. If the vm_private_data pointer already exists, the existing VM area structure is used instead of the allocated one which gets freed. 3) The LinuxKPI's vm_private_data pointer is used as the callback handle for the FreeBSD VM object. The VM subsystem in FreeBSD has a similar list to identify equal handles and will only call the character device pager's close function once. 4) All LinuxKPI VM operations are serialized through the mmap_sem sempaphore, which is per procedure, which prevents simultaneous access to the shared VM area structure when receiving page faults. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r316563: Fix implementation of task_pid_group_leader() in the LinuxKPI. In FreeBSD thread IDs and procedure IDs have distinct number spaces. When asking for the group leader task ID in the LinuxKPI, return the procedure ID and let this resolve to the first task in the procedure having a valid LinuxKPI task structure pointer. Sponsored by: Mellanox Technologies MFC r316564: Implement need_resched() in the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r316565: Define VM_READ, VM_WRITE and VM_EXEC in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r316568: Cleanup the bitmap_xxx() functions in the LinuxKPI: - Move all bitmap related functions from bitops.h to bitmap.h, similar to what Linux does. - Apply some minor code cleanup and simplifications to optimize the generated code when using static inline functions. - Implement the following list of bitmap functions which are needed by drm-next and ibcore: - bitmap_find_next_zero_area_off() - bitmap_find_next_zero_area() - bitmap_or() - bitmap_and() - bitmap_xor() - Add missing include directives to the qlnxe driver (davidcs@ has been notified) Sponsored by: Mellanox Technologies MFC r316606: The __stringify() macro in the LinuxKPI should expand any macros before stringifying. Sponsored by: Mellanox Technologies MFC r316609: Create the LinuxKPI current task structure on the fly if it doesn't exist when the current macro is used. Sponsored by: Mellanox Technologies MFC r316656: Fix compilation of LinuxKPI for PowerPC. Found by: emaste @ Sponsored by: Mellanox Technologies MFC r317135: Zero number of CPUs should be translated into the default number of CPUs when allocating a LinuxKPI workqueue. This also ensures that the created taskqueue always have a non-zero number of worker threads. Sponsored by: Mellanox Technologies MFC r317137: Fix problem regarding priority inversion when using the concurrency kit, CK, in the LinuxKPI. When threads are pinned to a CPU core or when there is only one CPU, it can happen that a higher priority thread can call the CK synchronize function while a lower priority thread holds the read lock. Because the CK's synchronize is a simple wait loop this can lead to a deadlock situation. To solve this problem use the recently introduced CK's wait callback function. When detecting a CK blocking condition figure out the lowest priority among the blockers and update the calling thread's priority and yield. If another CPU core is holding the read lock, pin the thread to the blocked CPU core and update the priority. The calling threads priority and CPU bindings are restored before return. If a thread holding a CK read lock is detected to be sleeping, pause() will be used instead of yield(). Sponsored by: Mellanox Technologies MFC r317138: Use __typeof() instead of typeof() in some RCU related macros in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r317504: Prefer to use real virtual address over direct map address in the linux_page_address() function in the LinuxKPI. This solves an issue where the return value from linux_page_address() is passed to kmem_free(). Sponsored by: Mellanox Technologies MFC r317651: Add on_each_cpu() and wbinvd_on_all_cpus(). Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D10550 MFC r317828: Fix for use after free in the LinuxKPI. Background: The same VM object might be shared by multiple processes and the mm_struct is usually freed when a process exits. Grab a reference on the mm_struct while the vmap is in the linux_vma_head list in case the first process which inserted a VM object has exited. Tested by: kwm @ Sponsored by: Mellanox Technologies MFC r317839: Use pmap_invalidate_cache() to implement wbinvd_on_all_cpus(). Suggested by: jhb X-MFC with: r317651 MFC r318026: Fix init order in the LinuxKPI for RCU support. CPU_FOREACH() is not available until SI_SUB_CPU at SI_ORDER_ANY when the LinuxKPI is loaded as part of the kernel. Sponsored by: Mellanox Technologies MFC r318590: Add get_cpu() and put_cpu(). MFC r319229: Add some miscellaneous definitions to support DRM drivers. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D10985 MFC r319312: Make sure the thread's priority is restored for all three cases inside linux_synchronize_rcu_cb() in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r319316: Fixes for refcounting "struct linux_file" in the LinuxKPI. - Allow "struct linux_file" to be refcounted when its "_file" member is NULL by using its "f_count" field. The reference counts are transferred to the file structure when the file descriptor is installed. - Add missing vdrop() calls for error cases during open(). - Set the "_file" member of "struct linux_file" during open. This allows use of refcounting through get_file() and fput() with LinuxKPI character devices. Sponsored by: Mellanox Technologies MFC r319317: Fix a reference count leak in the LinuxKPI due to calling VM open when it shouldn't be called. Background: The Linux VM open operation is called when a new VMA is created on top of the current VMA. This is done through either mremap flow or split_vma, usually due to mlock, madvise, munmap and so on. This is currently not supported by the LinuxKPI. Sponsored by: Mellanox Technologies MFC r319318: Don't acquire a reference on the VM-space when allocating the LinuxKPI task structure to avoid deadlock when tearing down the VM object during a process exit. Found by: markj @ Sponsored by: Mellanox Technologies MFC r319319: Remove the VMA handle from its list before calling the LinuxKPI VMA close operation to prevent other threads from reusing the VM object handle pointer. Sponsored by: Mellanox Technologies MFC r319320: Make sure the VMAP's "vm_file" field is referenced in a Linux compatible way by the linux_dev_mmap_single() function in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r319321: Properly set the .d_name field in the cdevsw structure for the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r319338: Implement in_atomic() function in the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r319340: Properly implement idr_preload() and idr_preload_end() in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r319341: Implement print_hex_dump(), print_hex_dump_bytes() and printk_ratelimited() in the LinuxKPI. While at it fix the inclusion guard of printk.h to be similar to the rest of the LinuxKPI header files. Sponsored by: Mellanox Technologies MFC r319409: Add generic kqueue() and kevent() support to the LinuxKPI character devices. The implementation allows read and write filters to be created and piggybacks on the poll() file operation to determine when a filter should trigger. The piggyback mechanism is simply to check for the EWOULDBLOCK or EAGAIN return code from read(), write() or ioctl() system calls and then update the kqueue() polling state bits. The implementation is similar to the one found in the cuse(3) module. Refer to sys/fs/cuse/*.[ch] for more details. Sponsored by: Mellanox Technologies MFC r319410: Translate the ERESTARTSYS error code into ERESTART in the LinuxKPI ioctl(), read() and write() system call handlers. This error code is internal to the kernel and should not be seen by user-space programs according to Linux. Submitted by: Yanko Yankulov <yanko.yankulov@gmail.com> Sponsored by: Mellanox Technologies MFC r319444: Make sure the selrecord() function is only called from within system polling contexts in the LinuxKPI. After the kqueue() support was added to the LinuxKPI in r319409 the Linux poll file operation will be used outside the system file polling callback function, which can cause a NULL-pointer panic inside selrecord() because curthread->td_sel is set to NULL. This patch moves the selrecord() call away from poll_wait() and to the system file poll callback function in the LinuxKPI, which essentially wraps the Linux one. This is similar to what the cuse(3) module is currently doing. Refer to sys/fs/cuse/*.[ch] for more details. Sponsored by: Mellanox Technologies MFC r319500: Add support for setting the non-blocking I/O flag for LinuxKPI character devices. In Linux the FIONBIO IOCTL is handled by the kernel and not the drivers. Also need return success for the FIOASYNC ioctl due to existing logic in kern_fcntl() even though it is not supported currently. Sponsored by: Mellanox Technologies MFC r319501: Improve kqueue() support in the LinuxKPI. Some applications using the kqueue() does not set non-blocking I/O mode for event driven read of file descriptors. This means the LinuxKPI internal kqueue read and write event flags must be updated before the next read and/or write system call. Else the read and/or write system call may block. This can happen when there is no more data to read following a previous read event. Then the application also gets blocked from processing other events. This situation can also be solved by the applications setting and using non-blocking I/O mode. Sponsored by: Mellanox Technologies MFC r319620: Fix init order in the LinuxKPI for IDR support after recent changes. CPU_FOREACH() is not available until SI_SUB_CPU at SI_ORDER_ANY when the LinuxKPI is loaded as part of the kernel. Sponsored by: Mellanox Technologies MFC r319656: Add more #ifdef arch checks to the linuxkpi arm, mips, and powerpc all implement pmap_mapdev_attr() and pmap_unmapdev(), so add those archs to the checks. powerpc also includes the atomic_swap_*() functions, so add that to the supported list as well. Not tested except by compiling powerpc. Reviewed by: markj MFC r319675: Remove ARM and MIPS from linuxkpi ioremap_attr definition ARM and MIPS fail universe builds. ARM and MIPS are missing the following: * VM_MEMATTR_WRITE_THROUGH * VM_MEMATTR_WRITE_COMBINING Pointy-hat to: jhibbits MFC r319757: Augment wait queue support in the LinuxKPI. In particular: - Don't evaluate event conditions with a sleepqueue lock held, since such code may attempt to acquire arbitrary locks. - Fix the return value for wait_event_interruptible() in the case that the wait is interrupted by a signal. - Implement wait_on_bit_timeout() and wait_on_atomic_t(). - Implement some functions used to test for pending signals. - Implement a number of wait_event_*() variants and unify the existing implementations. - Unify the mechanism used by wait_event_*() and schedule() to put the calling thread to sleep. This is required to support updated DRM drivers. Thanks to hselasky for finding and fixing a number of bugs in the original revision. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D10986 MFC r319758: Implement pci_disable_device() in the LinuxKPI. Submitted by: kmacy MFC r320063: Remove prototypes for unimplemented LinuxKPI functions. MFC r320072: Avoid including list.h in LinuxKPI headers. list.h includes a number of FreeBSD headers as a workaround for the LIST_HEAD name collision. To reduce pollution, avoid including list.h in commonly used headers when it is not explicitly needed. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11249 MFC r320078: Add kthread parking support to the LinuxKPI. Submitted by: kmacy (original version) Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11264 MFC r320189: Allow the VM fault handler to be NULL in the LinuxKPI when handling a memory map request. When the VM fault handler is NULL a return code of VM_PAGER_BAD is returned from the character device's pager populate handler. This fixes compatibility with Linux. Sponsored by: Mellanox Technologies MFC r320192: Add a lockdep macro to the LinuxKPI. Also fix some nearby style issues. MFC r320193: Include kmod.h from the LinuxKPI's module.h. MFC r320194: Add missing lock destructor invocations to the LinuxKPI unload handler. MFC r320196: Update io-mapping.h in the LinuxKPI. Add io_mapping_init_wc() and add a third (unused) parameter to io_mapping_map_wc(). Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11286 MFC r320333: Add noop_lseek() to the LinuxKPI. MFC r320334: Add the thaw_early method to struct dev_pm_ops in the LinuxKPI. MFC r320335: Add a couple of macros to lockdep.h in the LinuxKPI. MFC r320336: Add ns_to_ktime() to the LinuxKPI. MFC r320337: Add u64_to_user_ptr() to the LinuxKPI. MFC r320364: Implement parts of the hrtimer API in the LinuxKPI. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11359 MFC r320580: Let io_mapping_init_wc() fall back to an uncacheable mapping. This allows usage of the function on architectures that don't support write-combining. Reported by: bz, emaste X-MFC With: r320196 MFC r320627: Hold the PCI device list lock when removing an element. MFC r320633: Rename the "driver" field to "bsddriver" to avoid a name collision. MFC r320634: Add some PCI class definitions. MFC r320635: Add a field for the class code to struct pci_driver. Fill out some previously uninitialized fields as well. MFC r320636: Add some auxiliary types for device driver support. MFC r320656: Invoke suspend/resume methods from the driver pmops if available. Obtained from: kmacy (original version) MFC r320774: Fix a bug in synchronize RCU when the calling thread is bound to a CPU. Set "td_pinned" to zero after "sched_unbind()" to prevent "td_pinned" from temporarily becoming negative during "sched_bind()". This can happen if "sched_bind()" uses "sched_pin()" and "sched_unpin()". Sponsored by: Mellanox Technologies MFC r320775: Complete r320189 which allows a NULL VM fault handler in the LinuxKPI. Instead of mapping a dummy page upon a page fault, map the page pointed to by the physical address given by IDX_TO_OFF(vmap->vm_pfn). To simplify the implementation use OBJT_DEVICE to implement our own linux_cdev_pager_fault() instead of using the existing linux_cdev_pager_populate(). Some minor code factoring while at it. Reviewed by: markj @ Sponsored by: Mellanox Technologies MFC r320810: Add TASK_COMM_LEN to the LinuxKPI. MFC r320811: Add device_is_registered() to the LinuxKPI. MFC r320812: Fix the definitions of pgprot_{noncached,writecombine} after r316562. MFC r320813: Add some helper definitions to fs.h in the LinuxKPI. Add a field to struct linux_file to allow the creation of anonymous shmem objects. MFC r320852: Free existing per-thread task structs when unloading linuxkpi.ko. They are otherwise leaked. Reported and tested by: ae MFC r320853: Add a few functions to ktime.h in the LinuxKPI, and fix nearby style. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11534 MFC r320854: Add some functions to math64.h in the LinuxKPI, and fix nearby style. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11535 MFC r320956: Add some functions to jiffies.h. Also add some checks for overflow to existing functions. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11533 MFC r321773: Remove cycle_t type from the LinuxKPI similar to Linux upstream. Sponsored by: Mellanox Technologies MFC r321926: Fix LinuxKPI regression after r321920. The mda_unit and si_drv0 fields are not wide enough to hold the full 64-bit dev_t. Instead use the "dev" field in the "linux_cdev" structure to store and lookup this value. While at it remove superfluous use of parenthesis inside the MAJOR(), MINOR() and MKDEV() macros in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r322028: Add subsystem vendor and device ID fields to struct pci_dev. MFC r322169: Fix hrtimer_active() in case of cancellation. While there, switch to FreeBSD internal callout active status. Reviewed by: markj, hselasky Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11900 MFC r322212: Add macros for defining attribute groups and for WO and RW attributes. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11872 MFC r322213: Add round_jiffies_up(), local_clock() and __setup_timer() to the LinuxKPI. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11871 MFC r322272: Fix few issues of LinuxKPI workqueue. LinuxKPI workqueue wrappers reported "successful" cancellation for works already completed in normal way. This change brings reported status and real cancellation fact into sync. This required for drm-next operation. Reviewed by: hselasky (earlier version) Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11904 MFC r322354: Make sure the linux_wait_event_common() function in the LinuxKPI properly handles a timeout value of MAX_SCHEDULE_TIMEOUT which basically means there is no timeout. This is a regression issue after r319757. While at it change the type of returned variable from "long" to "int" to match the actual return type. Sponsored by: Mellanox Technologies MFC r322355: Fixes for wait event in the LinuxKPI. These are regression issues after r319757. 1) Correct the return value from __wait_event_common() from 1 to 0 in case the timeout is specified as MAX_SCHEDULE_TIMEOUT. In the other case __ret is zero and will be substituted in the last part of the macro with the appropriate value before return. 2) Make sure the "timeout" argument is casted to "int" before evaluating negativity. Else the signedness of a "long" might be checked instead of the signedness of an integer. 3) The wait_event() function should not have a return value. Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com> Sponsored by: Mellanox Technologies MFC r322357: Use integer type to pass around jiffies and/or ticks values in the LinuxKPI because in FreeBSD ticks are 32-bit. Sponsored by: Mellanox Technologies MFC r322392: Add a specialized function for DRM drivers to register themselves. Such drivers attach to a vgapci bus rather than directly to a pci bus. For the rest of the LinuxKPI to work correctly in this case, we override the vgapci bus' ivars with those of the grandparent. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D11932 MFC r322397: Make sure the "vm_flags" and "vm_page_prot" fields get set correctly in the VM area structure in the LinuxKPI when doing mmap() and that unsupported bits are masked away. While at it fix some redundant use of parenthesing inside some related macros. Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com> Sponsored by: Mellanox Technologies MFC r322567: Add device resource management fields to struct device. MFC r322713: Add a couple of trivial headers to the LinuxKPI. MFC r322714: Define prefetch() only if it hasn't already been defined. MFC r322746: Fix for deadlock situation in the LinuxKPI's RCU synchronize API. Deadlock condition: The return value of TDQ_LOCKPTR(td) is the same for two threads. 1) The first thread signals a wakeup while keeping the rcu_read_lock(). This invokes sched_add() which in turn will try to lock TDQ_LOCK(). 2) The second thread is calling synchronize_rcu() calling mi_switch() over and over again trying to yield(). This prevents the first thread from running and releasing the RCU reader lock. Solution: Release the thread lock while yielding to allow other threads to acquire the lock pointed to by TDQ_LOCKPTR(td). Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com> Sponsored by: Mellanox Technologies MFC r322795: Add some miscellaneous definitions to support the DRM drivers. MFC r322816: Set the bus number field when attaching a PCI device. MFC r323347: Add more sanity checks to linux_fget() in the LinuxKPI. This prevents returning pointers to file descriptors which were not created by the LinuxKPI. Sponsored by: Mellanox Technologies MFC r323349: Properly implement poll_wait() in the LinuxKPI. This prevents direct use of the linux_poll_wakeup() function from unsafe contexts, which can lead to use-after-free issues. Instead of calling linux_poll_wakeup() directly use the wake_up() family of functions in the LinuxKPI to do this. Bump the FreeBSD version to force recompilation of external kernel modules. Sponsored by: Mellanox Technologies MFC r323703: Add support for shared memory functions to the LinuxKPI. Obtained from: kmacy @ Sponsored by: Mellanox Technologies MFC r323704: Only wire pages in the LinuxKPI instead of holding and wiring them. This prevents the page daemon from regularly scanning the held pages. Suggested by: kib @ Sponsored by: Mellanox Technologies MFC r323705: The LinuxKPI atomics do not have acquire nor release semantics unless specified. Fix code to use READ_ONCE() and WRITE_ONCE() where appropriate. Suggested by: kib @ Sponsored by: Mellanox Technologies MFC r323910: Add support for 32-bit compatibility IOCTLs in the LinuxKPI. Bump the FreeBSD version to force recompilation of external kernel modules due to structure change. PR: 222504 Submitted by: Greg V <greg@unrelenting.technology> Sponsored by: Mellanox Technologies MFC r324278: Make sure the timer belonging to the delayed work in the LinuxKPI gets drained before invoking the work function. Else the timer mutex may still be in use which can lead to use-after-free situations, because the work function might free the work structure before returning. Sponsored by: Mellanox Technologies MFC r324285: Add get_random_{int,long} to the LinuxKPI. Fix some whitespace bugs while here. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D12588 MFC r324597: Don't call selrecord() outside the select system call in the LinuxKPI, because then td->td_sel is NULL and this will result in a segfault inside selrecord(). This happens when only using kqueue() to poll for read and write events. If select() and kqueue() is mixed there won't be a segfault. Reported by: Johannes Lundberg Sponsored by: Mellanox Technologies MFC r324606: Make the PHOLD in linux_wait_event_common() unconditional. After some in-progress work is committed, this would otherwise be the only instance of #if(n)def NO_SWAPPING in the tree. Moreover, the requisite opt_vm.h include was missing, so the PHOLD/PRELE calls were always being compiled in anyway. MFC r325279: Implement ioread16be() in the LinuxKPI. Sponsored by: Mellanox Technologies MFC r325360: Remove redundant dev->si_drv1 NULL checks in the LinuxKPI. This pointer is checked during the linux_dev_open() callback and does not need to be NULL checked again. It should always be set for character devices belonging to the "linuxcdevsw" and technically there is no need to NULL check this pointer at all. Suggested by: kib @ Sponsored by: Mellanox Technologies MFC r325635: Remove some not needed comments in the LinuxKPI. Use the Linux source tree to lookup documentation for the functions implemented in the LinuxKPI instead. Sponsored by: Mellanox Technologies MFC r325707: Mask away return codes from del_timer() and del_timer_sync() because they are not the same like in Linux. Sponsored by: Mellanox Technologies MFC r325708: Remove release and acquire semantics when accessing the "state" field of the LinuxKPI task struct. Change type of "state" variable from "int" to "atomic_t" to simplify code and avoid unneccessary casting. Sponsored by: Mellanox Technologies MFC r325767: Properly handle the case where the linux_cdev_handle_insert() function in the LinuxKPI returns NULL. This happens when the VM area's private data handle already exists and could cause a so-called NULL pointer dereferencing issue prior to this fix. Found by: greg@unrelenting.technology Sponsored by: Mellanox Technologies MFC r327676: linuxkpi: Implement kcalloc() based on mallocarray() This means we now get integer overflow protection, which Linux code might expect as it is also provided by kcalloc() in Linux. MFC r327788: linuxkpi: Simplify kmalloc_array. kmalloc_array seems what we call mallocarray(9). MFC r312926: (partial, no mergeinfo) Revert r312923 a better approach will be taken later MFC r312927: (partial, no mergeinfo) Revert crap accidentally committed MFC r316665: (partial, no mergeinfo) Import CK as of commit 6b141c0bdd21ce8b3e14147af8f87f22b20ecf32 This brings us changes we needed in ck_epoch. MFC r317053: (partial, no mergeinfo) Remove unneeded include of vm_phys.h. MFC r317055: (partial, no mergeinfo) All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h. MFC r322168: (partial, no mergeinfo) o Replace __riscv__ with __riscv o Replace __riscv64 with (__riscv && __riscv_xlen == 64) This is required to support new GCC 7.1 compiler. This is compatible with current GCC 6.1 compiler. RISC-V is extensible ISA and the idea here is to have built-in define per each extension, so together with __riscv we will have some subset of these as well (depending on -march string passed to compiler): __riscv_compressed __riscv_atomic __riscv_mul __riscv_div __riscv_muldiv __riscv_fdiv __riscv_fsqrt __riscv_float_abi_soft __riscv_float_abi_single __riscv_float_abi_double __riscv_cmodel_medlow __riscv_cmodel_medany __riscv_cmodel_pic __riscv_xlen Reviewed by: ngie Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11901 MFC r322672: (partial, no mergeinfo) Move some other SI_SUB_INIT_IF initializations to SI_SUB_TASKQ Drop the EARLY_AP_STARTUP gtaskqueue code, as gtaskqueues are now initialized before APs are started. Reviewed by: hselasky@, jhb@ Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D12054 MFC r326984: (partial, no mergeinfo) Update Matthew Macy contact info Email address has changed, uses consistent name (Matthew, not Matt) Reported by: Matthew Macy <mmacy@mattmacy.io> Differential Revision: https://reviews.freebsd.org/D13537 Changes: _U stable/11/ stable/11/sys/compat/linuxkpi/common/include/asm/atomic-long.h stable/11/sys/compat/linuxkpi/common/include/asm/atomic.h stable/11/sys/compat/linuxkpi/common/include/asm/atomic64.h stable/11/sys/compat/linuxkpi/common/include/asm/msr.h stable/11/sys/compat/linuxkpi/common/include/asm/pgtable.h stable/11/sys/compat/linuxkpi/common/include/asm/smp.h stable/11/sys/compat/linuxkpi/common/include/linux/atomic.h stable/11/sys/compat/linuxkpi/common/include/linux/bitmap.h stable/11/sys/compat/linuxkpi/common/include/linux/bitops.h stable/11/sys/compat/linuxkpi/common/include/linux/bottom_half.h stable/11/sys/compat/linuxkpi/common/include/linux/cdev.h stable/11/sys/compat/linuxkpi/common/include/linux/clocksource.h stable/11/sys/compat/linuxkpi/common/include/linux/compat.h stable/11/sys/compat/linuxkpi/common/include/linux/compiler.h stable/11/sys/compat/linuxkpi/common/include/linux/completion.h stable/11/sys/compat/linuxkpi/common/include/linux/device.h stable/11/sys/compat/linuxkpi/common/include/linux/dma-mapping.h stable/11/sys/compat/linuxkpi/common/include/linux/etherdevice.h stable/11/sys/compat/linuxkpi/common/include/linux/file.h stable/11/sys/compat/linuxkpi/common/include/linux/fs.h stable/11/sys/compat/linuxkpi/common/include/linux/gfp.h stable/11/sys/compat/linuxkpi/common/include/linux/hrtimer.h stable/11/sys/compat/linuxkpi/common/include/linux/idr.h stable/11/sys/compat/linuxkpi/common/include/linux/in.h stable/11/sys/compat/linuxkpi/common/include/linux/interrupt.h stable/11/sys/compat/linuxkpi/common/include/linux/io-mapping.h stable/11/sys/compat/linuxkpi/common/include/linux/io.h stable/11/sys/compat/linuxkpi/common/include/linux/jiffies.h stable/11/sys/compat/linuxkpi/common/include/linux/kdev_t.h stable/11/sys/compat/linuxkpi/common/include/linux/kernel.h stable/11/sys/compat/linuxkpi/common/include/linux/kobject.h stable/11/sys/compat/linuxkpi/common/include/linux/kthread.h stable/11/sys/compat/linuxkpi/common/include/linux/ktime.h stable/11/sys/compat/linuxkpi/common/include/linux/list.h stable/11/sys/compat/linuxkpi/common/include/linux/lockdep.h stable/11/sys/compat/linuxkpi/common/include/linux/math64.h stable/11/sys/compat/linuxkpi/common/include/linux/mm.h stable/11/sys/compat/linuxkpi/common/include/linux/mm_types.h stable/11/sys/compat/linuxkpi/common/include/linux/module.h stable/11/sys/compat/linuxkpi/common/include/linux/mutex.h stable/11/sys/compat/linuxkpi/common/include/linux/page.h stable/11/sys/compat/linuxkpi/common/include/linux/pci.h stable/11/sys/compat/linuxkpi/common/include/linux/pfn.h stable/11/sys/compat/linuxkpi/common/include/linux/pfn_t.h stable/11/sys/compat/linuxkpi/common/include/linux/pid.h stable/11/sys/compat/linuxkpi/common/include/linux/poll.h stable/11/sys/compat/linuxkpi/common/include/linux/preempt.h stable/11/sys/compat/linuxkpi/common/include/linux/printk.h stable/11/sys/compat/linuxkpi/common/include/linux/random.h stable/11/sys/compat/linuxkpi/common/include/linux/rculist.h stable/11/sys/compat/linuxkpi/common/include/linux/rcupdate.h stable/11/sys/compat/linuxkpi/common/include/linux/rwlock.h stable/11/sys/compat/linuxkpi/common/include/linux/rwsem.h stable/11/sys/compat/linuxkpi/common/include/linux/scatterlist.h stable/11/sys/compat/linuxkpi/common/include/linux/sched.h stable/11/sys/compat/linuxkpi/common/include/linux/semaphore.h stable/11/sys/compat/linuxkpi/common/include/linux/slab.h stable/11/sys/compat/linuxkpi/common/include/linux/smp.h stable/11/sys/compat/linuxkpi/common/include/linux/spinlock.h stable/11/sys/compat/linuxkpi/common/include/linux/srcu.h stable/11/sys/compat/linuxkpi/common/include/linux/string.h stable/11/sys/compat/linuxkpi/common/include/linux/sysfs.h stable/11/sys/compat/linuxkpi/common/include/linux/timer.h stable/11/sys/compat/linuxkpi/common/include/linux/types.h stable/11/sys/compat/linuxkpi/common/include/linux/uaccess.h stable/11/sys/compat/linuxkpi/common/include/linux/wait.h stable/11/sys/compat/linuxkpi/common/include/linux/workqueue.h stable/11/sys/compat/linuxkpi/common/include/linux/ww_mutex.h stable/11/sys/compat/linuxkpi/common/include/net/ip.h stable/11/sys/compat/linuxkpi/common/include/net/ipv6.h stable/11/sys/compat/linuxkpi/common/src/linux_compat.c stable/11/sys/compat/linuxkpi/common/src/linux_current.c stable/11/sys/compat/linuxkpi/common/src/linux_hrtimer.c stable/11/sys/compat/linuxkpi/common/src/linux_idr.c stable/11/sys/compat/linuxkpi/common/src/linux_kthread.c stable/11/sys/compat/linuxkpi/common/src/linux_lock.c stable/11/sys/compat/linuxkpi/common/src/linux_page.c stable/11/sys/compat/linuxkpi/common/src/linux_pci.c stable/11/sys/compat/linuxkpi/common/src/linux_rcu.c stable/11/sys/compat/linuxkpi/common/src/linux_schedule.c stable/11/sys/compat/linuxkpi/common/src/linux_slab.c stable/11/sys/compat/linuxkpi/common/src/linux_tasklet.c stable/11/sys/compat/linuxkpi/common/src/linux_work.c stable/11/sys/conf/files stable/11/sys/conf/files.amd64 stable/11/sys/contrib/rdma/krping/krping.c stable/11/sys/dev/mlx5/mlx5_core/mlx5_uar.c stable/11/sys/dev/qlnx/qlnxe/bcm_osal.h stable/11/sys/modules/linuxkpi/Makefile stable/11/sys/modules/qlnx/qlnxe/Makefile stable/11/sys/ofed/drivers/infiniband/core/cma.c stable/11/sys/ofed/drivers/infiniband/core/fmr_pool.c stable/11/sys/ofed/drivers/infiniband/core/iwcm.c stable/11/sys/ofed/drivers/infiniband/core/umem.c stable/11/sys/ofed/drivers/infiniband/hw/mthca/mthca_dev.h stable/11/sys/ofed/drivers/net/mlx4/pd.c stable/11/sys/sys/param.h stable/11/sys/sys/proc.h