Re: intel-pstate driver questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/18/2014 10:29 AM, Thomas Renninger wrote:
Hi,

several questions, mostly about user(space) interference:

1) sysfs tunables:
    - max_perf_pct, min_perf_pct
      According to Documentation/cpu-freq/intel-pstate.txt this is:
       max_perf_pct: limits the maximum P state that will be requested by
       the driver stated as a percentage of the available performance.

       min_perf_pct: limits the minimum P state that will be  requested by
       the driver stated as a percentage of the available performance.

      Why is this needed, there already is:
      scaling_max_freq, scaling_min_freq


The min/max tunable interface was chosen to map nicely onto future Intel CPU
P state selection mechanisms.

      How are both connected?
      For me those tunable are doing the same and intel_pstate specific ones
      should vanish to have one cpufreq min/max frequency interface exported
      to userspace on all archs/cpufreq drivers.


They are connected via the cpufreq_set_policy() interface in the cpufreq core.

    - no_turbo: limits the driver to selecting P states below the turbo
      frequency range.

      Again, there is the general cpufreq "boost" tunable defined in cpufreq.c:
      ssize_t show_boost(..)
      static ssize_t store_boost(...)
      define_one_global_rw(boost);

      What is the difference, why does intel-pstate need its own tunable?


The current "boost" interface came in after intel_pstate.

-> I'd like to integrate the intel-pstate specific stuff, mark above obsolete
    and let it use the generic cpufreq tunables.
    Would that work out or have I overseen something?

2) Disabling pstate driver (cpufreq in general)

    There is:
    intel_pstate=disable

    This again is somewhat driver specific. Imo cpufreq subsystem misses a
    general cpufreq.disable parameter for quite some time already.
    Best would be if this works at runtime as well.
    Not sure how an implementation could look like, I need to look deeper into
    that, but maybe someone already has an opinion about this.


This option was there to let people fallback to the old drivers if something
went horribly wrong.

cpufreq has an API call to allow it to be completely disabled.  ATM no one is
calling it that I am aware of, KVM was at one time.  You can work it out with
Rafael whether a parameter should be added to disable the core completely. :-)

Disabling cpufreq completely breaks a bunch of userspace tools.  cpufreq is
optional but in practice most people build it in and include tools that
rely on cpufreq being there.

For most of intel_pstate's development before it was merged intel_pstate was
calling cpufreq_disable since intel_pstate didn't really need the core to do
its work.  In fact I fixed some sneaky paths were the core could be called
into even after disable was called.

Integrating intel_pstate as a scaling driver with an internal governor in
the cpufreq subsystem was chosen to avoid breaking as many tools as practical
and provide an easy adoption path for those that wanted to use it.  Also the
precedent for this type driver was already set in the subsystem.

3) Why is intel-pstate needed at all?

Depending on the workload intel_pstate provides better system power efficiency
that using the ondemand governor and acpi_cpufreq scaling driver.


    This might have been discussed already? Would be great if someone can point
    be to the discussion then.
    I am interested in:
    - What is the advantage over acpi-cpufreq?

ACPI tables lie about the P states are available on a given CPU.  The ACPI spec
limits the number of P states exposed to 16 including the hack of having a
single P state represent the entire turbo range of the CPU.

    - There were discussions that on modern Intel CPUs cpufreq is a kind of
      obsolete power saving technique and it might be better, performance and
      power wise, to disable CPU frequency alltogether and let the CPU enter
      CPU idle states as quickly as possible instead.

This is mostly true.  Running the processor at a P state/frequency that is
higher than needed to service the load wastes power and thermal headroom.
You see this when the system is mostly idle or with workloads that are
I/O bound.


    - Are there numbers how much intel-pstate can affect performance
      (theoretically in worst case and practically (specific workload?))?

intel_pstate provides as good or better performance than the ondemand governor
in all cases I have seen. For some workloads you can get better performance
than the performance governor due to the fact that thermal headroom is being
conserved by running the CPU "just fast enough" allowing for more time to be
spent in the higher turbo bins.



Thanks,

        Thomas
--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe cpufreq" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Devel]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Forum]     [Linux SCSI]

  Powered by Linux