Re: Jewel ubuntu release is half cooked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrei,

Can you share your udev hack that you had to use?

Currently, i add "/usr/sbin/ceph-disk activate-all” to /etc/rc.local to activate all OSDs at boot. After the first reboot after upgrading to jewel, the journal disks are owned by ceph:ceph. Also, links are created in 
/etc/systemd/system/ceph-osd.target.wants/. I can now use “systemctl (start|stop) ceph.target to stop and start the OSDs. Unfortunately, when i disable the “ceph-disk activate-all” rule in rc.local and reboot again, the OSDs are not started. This, of course, is caused by the fact that the OSDs are not mounted at boot. As i understand, your udev hack script should to this.

Ernst


On 23 mei 2016, at 12:26, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:

Hello

I've recently updated my Hammer ceph cluster running on Ubuntu 14.04 LTS servers and noticed a few issues during the upgrade. Just wanted to share my experience.

I've installed the latest Jewel release. In my opinion, some of the issues I came across relate to poor upgrade documentation instructions, others to inconsistencies in the ubuntu package. Here are the issues i've picked up (I've followed the release notes upgrade procedure):


1. Ceph journals - After performing the upgrade the ceph-osd processes are not starting. I've followed the instructions and chowned /var/lib/ceph (also see point 2 below). The issue relates to the journal partitions, which are not chowned due to the symlinks. Thus, the ceph user had no read/write access to the journal partitions. IMHO, this should be addressed at the documentation layer unless it can be easily and reliably dealt with by the installation script.



2. Inefficient chown documentation -  The documentation states that one should "chown -R ceph:ceph /var/lib/ceph" if one is looking to have ceph-osd ran as user ceph and not as root. Now, this command would run a chown process one osd at a time. I am considering my cluster to be a fairly small cluster with just 30 osds between 3 osd servers. It takes about 60 minutes to run the chown command on each osd (3TB disks with about 60% usage). It would take about 10 hours to complete this command on each osd server, which is just mad in my opinion. I can't imagine this working well at all on servers with 20-30 osds! IMHO the docs should be adjusted to instruct users to run the chown in _parallel_ on all osds instead of doing it one by one.

In addition, the documentation does not mention the issues with journals, which I think is a big miss. In the end, I had to hack a quick udev rule to address this at the boot time, as my journal ssds were still owned by root:disk after a reboot.



3. Radosgw service - After the upgrade, the radosgw service was still starting as user root. Also, using the start/stop/restart scripts that came with the package simply do not start the service at all. For example, start radosgw or start radosgw-all-started does not start the service. I had to use the old startup script /etc/init.d/radosgw in order to start the service, but the service is started as user root and not ceph as intended in Jewel.


Overall, after sorting out most of the issues, the cluster is running okay for 2 days now. The radosgw issue still need looking at though.


Cheers

Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux