Wednesday, September 1, 2021

Blobfuse Troubleshooting

The blog points at troubleshooting some of the common issues/scenario that are faces while working with Blobfuse. Following the article, you can isolate the issues from your end to identify as to what could be the potential issue.

 

We assume that you have already mounted the Blob Storage following the below article:

How to mount Azure Blob storage as a file system on Linux | Microsoft Docs

 

Scenario 1:  High CPU Utilization for the Blob fuse process

This is among the most common issues being faced while working with Storage mounted using blob fuse. Let’s look at some of the actions can be taken to isolate the issue ahead.

 

Actions

  1. Check if there are any latency or degradation being faced from storage account standpoint.

Blobfuse is an adapter created over set of libraries. Now, the storage account is mounted using blobfuse adapter to connect and perform the operations ahead.

 

To start with, we can start by isolating this from storage account first. For isolating the latency, you can make use of below link to isolate the latency from storage account standpoint.

How to isolate latency issue for Azure Storage Account - Microsoft Tech Community

Similarly, you can check for availability in the Metrics if there is any degradation is being observed or if there are any storage exception being received.

 

  1. Check for any suspicious entry in the Blob Fuse Logs.

The logging parameter is set in the mount configuration by the help of –log-level parameter. By default, the logging level is of Warning however this can be set to other values as well depending upon your requirement which might give further insights to the issue.

 

By default, logs are directed to system-configured log file e.g. /var/log/syslog or var/log/message (Depending Upon the Linux OS Family). You can download these logs and search for keyword blobfuse or you can also do grep blobfuse and then move it a temporary file for further analysis ahead.

 

The above shall help in case any suspicious entries are observed against the blobfuse process. If the logs aren’t enabled or in default level, then it is recommended to try changing the log level mode.

 

  1. Check for version of the blobfuse

You can make use of below command to check the version of blobfuse that you have configured:

blobfuse --version

It is recommended to make use of latest version, and this can be checked from the below link.

Releases · Azure/azure-storage-fuse (github.com)

 

  1. Check for any antivirus running on the system.

One of the potential causes of CPU being eaten up could be because of any antivirus process being executed on the system at the same time. This could cause blobfuse process to shoot up for CPU and the recommendation here could be to disable the antivirus process and then observe the blobfuse process behavior. If disable doesn’t work, kindly try un-installing the same and see if the issue disappears or not.

 

  1. Check for other Infra parameters on the VM blobfuse process is running.

With CPU clocking up, try checking for infra parameters such as Memory or IOPS to check if any degradation or throttling observed there as well at the time CPU was shooting up. Herein we can make use of following commands as well:

top – This shall give details of process, process ID, CPU, resident memory etc.

ps – eo pcpu, pid, user, args | sort -k1 -r -n | head -10: Performs listing of process with details such as CPU, Process ID, user and arguments.

ps – aux | grep blobfuse: This helps identifying if blobfuse process is running or not

 

 

Scenario 2:  Mount Point not persistent after reboot

You have mounted blob storage over a VM. It has been observed that even though are receiving files over VM, they are not moving to Storage.

 

Actions

There could be a chance that your VM would have got rebooted and as a result your storage got un-mounted i.e. the mount is not persistent post the reboot.

 

Kindly check if your VM has got re-booted recently. You can check if your machine got rebooted or not using command like uptime that shall give you insights around the same.

 

The recommendation here is to make use of persistent mount. Instead of making use of flat command for mounting, you can make use of fstab entry, or you can wrap the command inside a script so that the mount point is restored in event of any VM reboot/restart. In order to make the configurations, once you have created the configuration file, please follow the below:

 

Edit /etc/fstab with the blobfuse script.

 

Add the following line to use mount.sh:

/<path_to_blobfuse>/mount.sh </path/to/desired/mountpoint> fuse _netdev  OR

 

Add the following line to run without mount.sh

 

blobfuse /home/azureuser/mntblobfuse fuse delay_connect,defaults,_netdev,--tmp-path=/home/azureuser/tmppath,--config-file=/home/azureuser/connection.cfg,--log-level=LOG_DEBUG,allow_other 0 0

 

https://github.com/Azure/azure-storage-fuse/wiki/2.-Configuring-and-Running#persisting

 

Scenario 3:  Facing Exception “No Space Left on Device”

Usually when such exception is faced, we try to run df command. The output gives an impression that the mount size is very less as compared to the size of storage account.

 

A point here to note is that df command for mounted directory does not tell the total amount of data stored in your storage container, rather it just tells the disk usage where the temp directory resides i.e. your local disk or ramdisk. Temp directory in case of blobfuse is just a caching directory where files are cached for some time and then wiped out.


Blob storage capacity is separate from the capacity of the mounted drive. When you mount blob storage you don't have to specify a size and it will use the local filesystem as a buffer cache on the ephemeral drive or you can also use a ramdisk as specify the size. Refer to this documentation:
https://github.com/Azure/azure-storage-fuse/wiki/2.-Configuring-and-Running

 

The below document points at specifying a higher size ramdisk.

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux#optional-use-a-ramdisk-for-the-temporary-path

 

You can make use of parameter -file-cache-timeout-in-seconds. The value is 120 seconds or 2 minutes by default, and you can try tweaking this value to clear the files from the cache ahead. From application standpoint, it is a good practice to close the file once operation have been completed on that one.

 

Scenario 4: Higher Throughput

Blobfuse allows multiple nodes to mount the same container. With blobfuse mounted on multiple nodes, you can take advantage of the increased throughput and IOPS limit on storage accounts while accessing the data with the regular file system APIs.  However, for concurrent writes, this is not a feasible approach.

 

It is also recommended to use blobcp to parallel copy data to the storage blobfuse mount which should increase throughput. Below is the reference link:

azure-storage-fuse/tools at master · Azure/azure-storage-fuse (github.com)

For higher throughput, you can also make use of AZCOPY (with more concurrent threads).  The VM size  too plays an important role and having a higher config VM (multiple cores) with appropriate configurations shall ideally provide a higher throughput.

Optimize the performance of AzCopy v10 with Azure Storage | Microsoft Docs

 

Hope this helps!

Posted at https://sl.advdat.com/2WFYlqg