Some notes about the Linux file system

2023-03-07 13:08:08

Friends who do kernel development may be familiar with the code below.

1. static const struct file_operations xxx_fops = {

2. .owner = THIS_MODULE,

3. .llseek = no_llseek,

4. .write = xxx_write,

5. .unlocked_ioctl = xxx_ioctl,

6. .open = xxx_open,

7. .release = xxx_release,

8. };

In general, we will allocate a block of memory in xxx_open with code similar to the following.

[cpp] view plain copy

1. file->private_data = kmalloc(sizeof(struct xxx), GFP_KERNEL);

Then in the next read/write/ioctl, we can get the data associated with this file via file->private_data.

Finally, in xxx_release, we release the memory pointed to by file->private_data.

If only the above processes access the data pointed to by file->private_data, there will be basically no problem.

Because the kernel's file system framework has been well handled.

For burst access, we can also solve it by means of locks and so on.

However, we usually access the data pointed to by file->private_data in some asynchronous processes. These asynchronous processes may be triggered by factors such as timers, interrupts, and interprocess communication.

Also, these processes access the data without going through the kernel's file system framework.

Then this may lead to problems.

Let's take a look at some of the implementation code of the kernel file system framework, and then consider how to avoid possible problems. Our analysis is based on the kernel source of linux-3.10.102.

First of all, to get an fd, you must first call the behavior of the C library function open. Before the C library function open returns, other threads can't get fd, of course, it will not operate on this fd. When you get fd, the open operation is complete.

In fact, more exaggerated situations are still possible. For example, due to program errors or even programmers deliberately constructing special code, other threads use the fd to be returned for file operations before open returns. This situation is not discussed here. Interested friends, you can delve into the kernel code and see what happens.

First look at the main function call of the file open operation:

Sys_open, do_sys_open, do_filp_open, fd_install, __fd_install.

The operation of installing fd is as follows. It can be seen that the file table is locked, and it is not for a single file, it is a ensemble of integrity.

[cpp] view plain copy

1. void __fd_install(struct files_struct *files, unsigned int fd,

2. struct file *file)

3. {

Struct fdtable *fdt;

5. spin_lock(&files->file_lock);

6. fdt = files_fdtable(files);

7. BUG_ON(fdt->fd[fd] != NULL);

8. rcu_assign_pointer(fdt->fd[fd], file);

9. spin_unlock(&files->file_lock);

10. }

Read and write operations, the code structure is very similar. Just look at the write operation here. Its implementation is as follows:

[cpp] view plain copy

1. SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,

2. size_t, count)

3. {

4. struct fd f = fdget(fd);

5. ssize_t ret = -EBADF;

7. if (f.file) {

8. loff_t pos = file_pos_read(f.file);

9. ret = vfs_write(f.file, buf, count, &pos);

10. file_pos_write(f.file, pos);

11. fdput(f);

12. }

13.

14. return ret;

15. }

[cpp] view plain copy

1. ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)

2. {

3. ssize_t ret;

5. if (!(file->f_mode & FMODE_WRITE))

6. return -EBADF;

7. if (!file->f_op || (!file->f_op->write && !file->f_op->aio_write))

8. return -EINVAL;

9. if (unlikely(!access_ok(VERIFY_READ, buf, count)))

10. return -EFAULT;

11.

12. ret = rw_verify_area(WRITE, file, pos, count);

13. if (ret >= 0) {

14. count = ret;

15. file_start_write(file);

16. if (file->f_op->write)

17. ret = file->f_op->write(file, buf, count, pos);

18. else

19. ret = do_sync_write(file, buf, count, pos);

20. if (ret > 0) {

21. fsnotify_modify(file);

22. add_wchar(current, ret);

twenty three. }

24. inc_syscw(current);

25. file_end_write(file);

26. }

27.

Return ret

29. }

[cpp] view plain copy

1. ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)

2. {

3. struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };

4. struct kiocb kiocb;

5. ssize_t ret;

7. init_sync_kiocb(&kiocb, filp);

8. kiocb.ki_pos = *ppos;

9. kiocb.ki_left = len;

10. kiocb.ki_nbytes = len;

11.

12. ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);

13. if (-EIOCBQUEUED == ret)

14. ret = wait_on_sync_kiocb(&kiocb);

15. *ppos = kiocb.ki_pos;

16. return ret;

17. }

It can be seen that the read and write operations are lock-free. Also not easy to lock, because of the read and write operations, as well as ioctl, it is possible to block. If a lock is required, the user can use the file lock himself. There is a description of the file lock in "Advanced Programming for UNIX Environments".

However, fdget and fdput contain some rcu operations, which is to be able to perform with the close fd operation.

In addition, it can be seen that if only one f_op->aio_write is implemented, the C library function write can also be supported.

Let's take a look at the implementation of ioctl.

[cpp] view plain copy

1. SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)

2. {

Int error;

4. struct fd f = fdget(fd);

6. if (!f.file)

7. return -EBADF;

8. error = security_file_ioctl(f.file, cmd, arg);

9. if (!error)

10. error = do_vfs_ioctl(f.file, fd, cmd, arg);

11. fdput(f);

Return error;

13. }

For non-conventional files, or file system-specific commands in regular files, you will eventually arrive.

Filp->f_op->unlocked_ioctl

In addition, ioctl is also unlock-free. At the same time, the process includes fdget and fdput, which is the same as read/write.

Let's take a look at the operation of closing the file. The system call sys_close is implemented as follows (fs/open.c)

[cpp] view plain copy

1. SYSCALL_DEFINE1(close, unsigned int, fd)

2. {

3. int retval = __close_fd(current->files, fd);

5. /* can't restart close syscall because file table entry was cleared */

6. if (unlikely(retval == -ERESTARTSYS ||

7. retval == -ERESTARTNOINTR ||

8. retval == -ERESTARTNOHAND ||

9. retval == -ERESTART_RESTARTBLOCK))

10. retval = -EINTR;

11.

12. return retval;

13. }

It can be seen that the main work is done by the __close_fd function (fs/file.c), and the code is as follows. It can be seen that he has locked the file table of the process. Therefore, open and close operations are mutually exclusive, and are not mutually exclusive for a file, but are mutually exclusive.

For close an fd, what if the thread on other cpu is about to read or write this fd? It can be seen that the close operation does not wait for this, but continues directly.

Among them, rcu_assign_pointer(fdt->fd[fd], NULL); clears the association between this fd and file structure, so after this fd has not accessed the corresponding file structure. As for how to handle the access that was initiated before and has not ended, the answer is handled in filp_close.

[cpp] view plain copy

Int __close_fd(struct files_struct *files, unsigned fd)

2. {

3. struct file *file;

Struct fdtable *fdt;

6. spin_lock(&files->file_lock);

7. fdt = files_fdtable(files);

8. if (fd >= fdt->max_fds)

Goto out_unlock;

10. file = fdt->fd[fd];

11. if (!file)

Goto out_unlock;

13. rcu_assign_pointer(fdt->fd[fd], NULL);

14. __clear_close_on_exec(fd, fdt);

15. __put_unused_fd(files, fd);

16. spin_unlock(&files->file_lock);

17. return filp_close(file, files);

18.

19. out_unlock:

20. spin_unlock(&files->file_lock);

21. return -EBADF;

twenty two. }

Filp_close calls fput again, the latter related code is as follows. It can be seen that if the current task is not a kernel thread, the next step is to take ____fput, otherwise it is delayed_fput.

But in the end it is all __fput, __fput will call file->f_op->release, our xxx_release.

However, as can be seen from the fput code, ____fput will be triggered by rcu-related work. Therefore, it can be foreseen that when ____fput is called, there is no access flow for this file that has occurred and has not ended.

[cpp] view plain copy

1. static void ____fput(struct callback_head *work)

2. {

3. __fput(container_of(work, struct file, f_u.fu_rcuhead));

4. }

7. void flush_delayed_fput(void)

8. {

9. delayed_fput(NULL);

10. }

11.

12. static DECLARE_WORK(delayed_fput_work, delayed_fput);

13.

14. void fput(struct file *file)

15. {

16. if (atomic_long_dec_and_test(&file->f_count)) {

17. struct task_struct *task = current;

18.

19. if (likein(!in_interrupt() && !(task->flags & PF_KTHREAD))) {

20. init_task_work(&file->f_u.fu_rcuhead, ____fput);

21. if (!task_work_add(task, &file->f_u.fu_rcuhead, true))

22. return;

twenty three. }

twenty four.

25. if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))

26. schedule_work(&delayed_fput_work);

27. }

28. }

Now, let's think about the asynchronous processes we mentioned above that access the data pointed to by file->private_data. These processes do not follow the file system framework.

Will this happen, xxx_release has been executed, but the asynchronous process still accesses the data pointed to by file->private_data?

In fact, xxx_release may not release the memory pointed to by file->private_data, but mark his status as closed. Then when the asynchronous process accesses this data, first check the status.

If it is closed, it can be handled and released properly.

180 T

1. More than 12000 styles of data, which cover all models of front films, back films and full coverage back films. They are suitable for mobile phones, tablets, watches, cameras, and airpods, etc.
2. Can cut 12.9 inches of large films, and use it as you like.
3. Automatic film feeding with movie exit function, more convenient to operate.
4. Cloud data update, update models automatically.
5. Hole Position Precise. The screen protector film, which has been cut is perfect fitting with the phone screen.
6. Suitable for different types of TPH / TPU / 9H / INK film
7. Support phone x and y mirror
8. Support the cut explosion-proof film in the size of 0.35-0.37mm.
9. Support car rear view mirror film
10. Support blank Back Film DIY function

Screen Protection TPU Film Cutting Machine,Hydrogel Screen Protection Cutting Machine,high quality Mobile Phone Screen Protection,Mobile Phone Screen Protection Hydrogel Film Cutting Machine,Hydrogel Film Cutting Machine

Mietubl Global Supply Chain (Guangzhou) Co., Ltd. , https://www.mietublmachine.com