I need to store 100k files (around 40GB) in a USB drive. Each file has a unique int id (e.g 45000).
Option one is to put all files in a single folder:
root/ root/1.pdf root/2.pdf root/3.pdf ... root/567.pdf root/568.pdf root/569.pdf ... root/10001.pdf root/10002.pdf root/10003.pdf ... root/99998.pdf root/99999.pdf root/100000.pdf
Option two is to create a
[1-9][0-9]* folder hierarchy based on that id:
root/ root/1/file.pdf root/2/file.pdf root/3/file.pdf ... root/5/6/7/file.pdf root/5/6/8/file.pdf root/5/6/9/file.pdf ... root/1/0/0/0/1/file.pdf root/1/0/0/0/2/file.pdf root/1/0/0/0/3/file.pdf ... root/9/9/9/9/8/file.pdf root/9/9/9/9/9/file.pdf root/1/0/0/0/0/0/file.pdf
Which option will scale better? I can understand that the second option will require tons of folders but each folder will at most contain 10 folders and 1 file. Maintenance will not be an issue since everything will be controlled by an application.
Note that this is a USB drive on linux and based on the above I’d also like to know whether I should go with FAT32 or NTFS.
I would recommend ext3/4 for use with Linux as my personal preference.
For the file structure I would recommend option number 3 (a balance of directory depth and files per directory). This is really just about choosing a tree data structure. To achieve this for the files I would do a md5sum hash of each file and use the first x characters of each file as directories. The characters will always be hexidecemial characters so each branch will be 16 directories wide. The number of characters you chose will be the height of the tree structure.
kbrandt@alpine:~/scrap$ md5sum y.tab.h 03b01228467fbe94f8fedd9fcbb6d470 y.tab.h
Would go in a something like
How to pre-create directories on linux for file storage? shows you how to precreate the directories.
This is a generic solution that works pretty well for many use cases and should create a pretty good distribution of files.
- When storing files, how many should I store per folder on a filesystem?
- Ubuntu 10.10 Live CD – How to copy files to external USB drive (no permissions?)
- Storing and backing up 10 million files on Linux
- Why does my external USB hard drive refuse to unmount after transferring large tar files to it?
- Linux Directory Structure for non-root users
Leave a comment
- Cron expression that runs every 5 minutes from 1:30 am – 6:00 am [duplicate]
- Understanding redundant power supplies
- Is there a way for administrators to disable users from installing Firefox extensions?
- Is there research material on NTP accuracy available?
- How to create a limited “domain admin” that does not have access to domain controllers?