Re: Could apc_fetch return a pointer to data in shared memory ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 31 Mar 2012, at 13:14, Simon wrote:

> Thanks again Stuart.
> 
> On 31 March 2012 12:50, Stuart Dallas <stuart@xxxxxxxx> wrote:
>> On 31 March 2012 11:19, Simon <slgard@xxxxxxxxx> wrote:
>> Thanks for your answer.
>> 
>> On 31 March 2012 09:50, Stuart Dallas <stuart@xxxxxxxx> wrote:
>> On 31 Mar 2012, at 02:33, Simon wrote:
>> 
>> > Or: Why doesn't PHP have Applications variables like ASP.NET  (and node.js)
>> > ?
>> >
>> > Hi,
>> >
>> > I'm working on optimising a php application (Drupal).
>> >
>> > The best optimisation I've found so far is to use APC to store various bits
>> > of Drupal data in RAM.
>> >
>> > The problem with this is that with Drupal requiring say 50Mb of data* per
>> > request is that lots of cpu cycles are wasted de-serialising data out of
>> > apc_fetch. Also 50Mb of data per http process !! is wasted by each one
>> > re-creating it's own copy of the shared data.
>> 
>> 50MB? WTF is it storing?? I've never used Drupal, but based purely on that it sounds like an extremely inefficient piece of software that's best avoided!
>> 
>> All sorts of stuff (taxonomies, lists of data, menu structures, configuration settings, content etc). Drupal is a sophisticated application. Besides, 50Mb of data seems like relatively tiny "application state" to want to access in fastest possible way. It's not hard to imagine wanting to use *much* more than this in future
>>  
>> 
>> > If it were possible for apc_fetch (or similar function) to return a pointer
>> > to the data rather than a copy of the data this would enable incredible
>> > reduction in cpu and memory usage.
>> 
>> Vanilla PHP adheres to a principle known as "shared nothing architecture" in which, shockingly, nothing is shared between processes or requests. This is primarily for scalability reasons; if you stick to the shared nothing approach your application should be easily scalable.
>> 
>> Yes, I know. I think the effect of this is that php will scale better (on average) in situations where requests don't need to share much data such as "shared hosting". In an enterprise enviroment where the whole server might be dedicated to single application, "shared nothing" seems to be a synonym for "re-load everything" ?
>> 
>> Yes, on one level that is what it means, but alternatively it could mean being a lot more conservative about what you load for each request.
> 
> Um, I want to be *less* conservative. Possibly *much* less. (like Gigabyes or even eventually Petabytes of shared data !)

We appear to have drifted off the point. There's a big difference between data that an application needs to access and "application variables".

What you're describing is a database. If you want something more performant there are ways to optimise access to that amount of data, but if not I've completely lost what the problem is that you're trying to solve.

>> > This is essentially how ASP.NET Application variables and node.js work.
>> 
>> Not a valid comparison. Node.js applications can only share variables within a single process, and they can do so because it's single-threaded. Once you scale your app beyond a single process you'd need to add a custom layer on to share data between them.
>> 
>> I'm not sure about the architecture behind IIS and ASP.net but I imagine there are similar paradigms at work.
>> 
>> I totally agree although,  I *think* IIS uses multiple threads running in a single process (or "Application Pool").
>> I realise that ASP.NET / node.js have their own architectural issues but I'm confident that for enterprise applications
>> (ie Drupal) the option for "shared something" is capable of many orders of magnitude higher performance and scalability than "shared nothing".
>> 
>> And that's why there are so many options around that enable such functionality. The need for something doesn't in any way imply that it should be part of the core system. Consider the impact such a requirement would have on the environment in which you run PHP. By delegating that "feature" to third-party modules, the PHP core doesn't need to concern itself with the details of how to share data between processes on every target platform.
> 
> Agreed. If you were able to point me in the direction of such a 3rd party module I'd be a very happy man.

APC and memcached are two of the most common examples, other than the vast array of DBMSes out there.
 
>> > I'm surprised PHP doesn't already have Application variables, given that
>> > they are so similar to Session Variables and that it's been around for a
>> > long time in ASP / ASP.NET.
>> 
>> Just because x does it, doesn't mean y should. I've used lots of languages over the years, including classic ASP, ASP.net, Perl, Python, Ruby, PHP (obv), and more, and I'm yet to see a compelling reason to want application variables. 
>> 
>> The reason that I'm suggesting this is because taking the example of Drupal, the ability to share information between requests "by reference" rather than by copy has the potential to be *millions* of times faster. Assuming I had say a 5Mb dataset that I wanted to re-use between request and lets say (optimistically) that "de-serialising" an object from apc_fetch takes 10 cpu cycles per "character" it would be ~50 million* times faster to pass this data as a pointer ?  *Assuming simplistically that the pointer can be passed in 1 cpu cycle.
>> 
>> You say "by reference" but I'm not convinced that the implementation of  application variables means they're not copied into each process. In addition, the cost of de-serialising data is minuscule in the grand scheme of any non-trivial application.
> 
> No, I am 100% certain they're not copied into each process.

One process cannot access data in another process without it being copied. A thread can access data from another thread without copying it, but if it's not read-only it needs to be access-controlled which would be a massive performance hit. I don't know because I've never cared, but I'd bet good money that when you read an application variable in asp.net, you get a copy of that data.

>> >Let go of the possibility of application variables and your thinking will shift to other ways of solving the problem.
>> I've spent a long time thinking about this and whilst I can think of many other ways to "solve" this problem (APC, memcached, SHM) they all suffer from the problem that "passing by copy" is potentially millions or billions of times slower than passing by reference and is potentially *hundreds* of times less memory efficient.
>> 
>> If you had a further suggestions I'd be very interested to hear them.
>> 
>> See below.
>>  
>> > I just wondered if there was a reason for not having this functionality or
>> > if it's on a road map somewhere or I've missed something :) ?
>> 
>> 
>> As far as I am aware, ASP and ASP.net are the only web technologies to support application variables out of the box. You think that's simply because the others just haven't gotten around to it yet?
> 
> Honestly, I don't know. I realise there benefits in certain circumstances to shared nothing. However if I have an application where I want to maintain state between requests (ie any non trivial application?) it seems that Application variables (or an event loop) are many orders of magnitude more performant and
> there doesn't seem to be a way to achieve the same in PHP.

What do you want to store between requests? If it's per-user then you want sessions (I have some views on the "traditional" implementation and usage of sessions, but that's for another email). If you want to store data that needs to be made available to every user, that's why databases exist. If a database is too slow then you can use memcached. If you're only ever going to be on one server you can use APC. There's no need for PHP to natively support this feature.

>> It would be great if someone could tell me specifically why I'm wrong OR if I can persuade the php community that "shared nothing" is wrong in certain circumstances (basically enterprise applications!) and application variables could be added to PHP 
>> 
>> You're not wrong in saying that it can be incredibly useful to be able to share common data between processes, but I think you're approaching it from the wrong angle. Let's take the list of things that Drupal wants to store...
>> 
>> * taxonomies
>> * menu structures
>> * configuration settings
>> 
>> I'm guessing these things don't change while the application is running, and could easily be dumped out to PHP files that can then be included as needed, at a far lower processing cost than accessing a shared data store.
>> 
> I think this suffers from the at least the same overhead as apc_fetch
> 
> And an advantage of Applications variables is that they can change (very) frequently.

Reading PHP files, especially when you use a bytecode cache, is one of the fastest way to read data. If the data is changing frequently then you want a database / memcached / APC (see my previous answer).

>> * content
>> 
>> If you're talking about caching static content please refer to my answer above - no reason these can't also be stored in files. If you're talking about caching generated output then memcached is the best solution I've found.
> 
> I've actually found caching to a filesystem to be 5x faster than memcached (remembering that *nix automatically caches frequently used files in RAM)

Above you said that using files would have at least the same overhead as APC.

>> * lists of data
>> 
>> Not sure what you mean by this, but one of the above two answers probably applies.
> 
> Actually, I mean Drupal "Views" you are correct. 

For caching output I've used files (fast when subsequent requests bypass PHP), memcached (incredibly fast), and a caching proxy.

>> My basic point is that the shared nothing approach to scalability has been proven as a big benefit, and I would hate to see that feature of PHP compromised just because use cases exist where it's not idea. Better to have add-ons to provide what you need.
> 
> As above, agreed. If you were able to point me in the direction of an add-on I'll be very happy.

I have, several times. APC is one option but is limited to a single server. Memcached is, IMO, the best multi-server option. If you're talking about more than ~1MB of data I'd go with a database.

Getting back to the gigabytes or even petabytes of data you want to share across the application, what do you have against databases?

-Stuart

-- 
Stuart Dallas
3ft9 Ltd
http://3ft9.com/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [Find Someone]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux