04 March, 2008

Using parted and LVM2 for large partitions

I wanted to spread a partition across two RAID cards, one drive partition on each card. Here is the server's physical drive configuration.

|-RAID0-|-RAID5---------|
| 00 01 | 02 03 04 05 06|

|-RAID5------------------------------------|
| 00 01 02 03 04 05 06 07 08 09 10 11 12 13|
The second RAID5 is a few TB and fdisk won't work on partitions larger than 2TB, so I use parted to create a partition that fills the free space.
# parted /dev/sdc
GNU Parted 1.8.1
Using /dev/sdc
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print

Model: DELL PERC 5/E Adapter (scsi)
Disk /dev/sdc: 3893GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags

(parted) mklabel gpt

(parted) mkpart primary 0 3893G
(parted) print

Model: DELL PERC 5/E Adapter (scsi)
Disk /dev/sdc: 3893GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.4kB 3893GB 3893GB primary

(parted) quit
Then I use LVM2 to combine the two RAIDs, sdb1 and sdc1, into one logical volume.
# pvcreate /dev/sdb1 /dev/sdc1
Physical volume "/dev/sdb1" successfully created
Physical volume "/dev/sdc1" successfully created
# vgcreate nsm_vg /dev/sdb1 /dev/sdc1
Volume group "nsm_vg" successfully created
# pvscan
PV /dev/sdb1 VG nsm_vg lvm2 [272.25 GB / 632.00 GB free]
PV /dev/sdc1 VG nsm_vg lvm2 [3.54 TB / 3.54 TB free]
Total: 2 [1.94 TB] / in use: 2 [1.94 TB] / in no VG: 0 [0 ]

# lvcreate -L 3897G -n nsm_lv nsm_vg
Logical volume "nsm_lv" created

# mkfs.ext3 -m 1 /dev/nsm_vg/nsm_lv
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
528482304 inodes, 1056964608 blocks
10569646 blocks (1.00%) reserved for the super user
First nsm block=0
Maximum filesystem blocks=0
32256 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632,
2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 34 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

# mount /dev/mapper/nsm_vg-nsm_lv /nsm

$ df -h
Filesystem Size Used Avail Use% Mounted on
---snip---
/dev/mapper/nsm_vg-nsm_lv
3.8T 196M 3.8T 1% /nsm
Now I can put the entry for mounting /nsm into /etc/fstab.

5 comments:

  1. Good writeup! I also suggest mounting it "async,noatime", which helps make things more efficient when you're doing a constant stream of writes, as sguil does with PCAP files.

    ReplyDelete
  2. David, thanks for pointing that out. I am actually using "defaults,noatime". For those wondering, the mount man page should have a list of default options for a given version:

    defaults
    Use default options: rw, suid, dev, exec, auto, nouser, and async.

    ReplyDelete
  3. If you are using LVM2 to combine several disk, why do you think you have to partition each PV, too? I always use whole disks as PV.

    A problem might be, that "fdisk" does not recognize any existing PV and displays an "empty" disk (so does Windows if running in parallel). But as I don't run Windows, nor use fdisk any more, this is not a problem for me. I run "pvs" to get an overview.

    You may even get a serious performance drawback: After writing a label (even a GPT label) to the disk, /dev/sdc1 starts at some odd sector number (i.e: 63 sectors). The underlaying raid5 however works with some fixed block size and so does LVM. By inducing a shift of 63 secors, both systems become misaligned and you may have to touch two RAID5 blocks for each LVM block to read/write.

    ReplyDelete
  4. Dieter, thanks for the comment. You make a good point about using each the whole disks.

    ReplyDelete
  5. I as well use the whole disk. I have three 8TB machines with the buggy libparted-1.7.1

    ReplyDelete