Thanks for the info, unfortunately because this was a prod environment we had to recreate the vm. We'll respond to this thread with more what you asked for when this happens next because it seems to be a common occurrence for us. On Fri, Jan 18, 2019 at 2:19 PM Liu Bo <obuil.liubo@xxxxxxxxx> wrote: > > On Fri, Jan 18, 2019 at 10:58 AM Krishna Mannem <kmannem@xxxxxxxxxx> wrote: > > > > Hi, > > > > I work on Concourse-CI (https://urldefense.proofpoint.com/v2/url?u=https-3A__concourse-2Dci.org_&d=DwIBaQ&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=pq-JLFZ85OYQ6hcu5J3yXUlGj4m8iHEWZ-T9iUoNipI&m=rJUavGsLrC-rXvaQ9BMhkGoHezgwsJJIY8ynWZvq_vo&s=CfAfR1rhxlqYpt5uPypCykcQ0URd7u36Xcq7qdmZJak&e=). It's a container > > based CI system where we create volumes using Btrfs. Due to the nature > > of Concourse, btrfs subvolumes are short lived ( sometimes a few > > seconds if it's a small automation task, sometimes hours if its a long > > build and test suite). We ran into a failure and we're not really sure > > what to make of it. Looks like a lock failed to get acquired before > > scheduling? We need another eye on this. > > > > >uname -a > > Linux a06d0ae3-b242-4518-85ba-977426a3f214 4.15.0-43-generic > > #46~16.04.1-Ubuntu SMP Fri Dec 7 13:31:08 UTC 2018 x86_64 x86_64 > > x86_64 GNU/Linux > > > > > btrfs df > > btrfs filesystem df /var/vcap/data/baggageclaim/volumes > > Data, single: total=263.01GiB, used=262.59GiB > > System, DUP: total=8.00MiB, used=48.00KiB > > Metadata, DUP: total=9.00GiB, used=6.60GiB > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > > Calltrace: > > https://gist.github.com/kcmannem/3baf848845bc986f5ac7d23d3df51e56 > > Can you please dump a "echo w > /proc/sysrq-trigger" so that > developers can have a good understanding on what's going on? > > From the above info, > 1. mm->mmap_sem is held > 2. transaction is being committed and others are waiting for it. > 3. subvolumes are deleted in short time which may make btrfs-cleaner > thread busy as hell. > > thanks, > liubo
