Joe in Marketing has a local SQL Server Express (the free edition) database that he uses to manage mailshots and he wants to get the contact details for yesterday's mailshot from the core business data which is in The Cloud.
To answer the query shown on the left the database server, the program that reads and writes to the database as in nearly all modern systems that is a separate program to your own application, has to
- Create a list of customer ids from LocalMailShotTable that had the relevant mailshot.
- Read the table CloudCustomerTable to get the telephone numbers for those customer ids.
When both tables are either in the same database or on the same server the list of customer ids can be held in RAM.
When the two are separated by one being in The Cloud and one on a computer within the business a step needs to be taken that is transparent to the user but is highly resource intensive.
Assuming that the request is being processed locally with the CloudCustomerTable in the cloud and the LocalMailShotTable is local to the business then the database server has no choice but to.
- Download the list of ALL customers from The Cloud via the company's internet connection.
- Bring this list into the database and index it, this is a quite resource intensive process as the data has to be written to the disc.
Modern servers deliver very high levels of performance by storing in RAM the data that is most often used, anything that reads and writes to the disc dramatically reduces that performance.
- Create the list of customers who had the mailshot and find the matching customer details from the temporarily downloaded list of all customer.
The request could be processed by the database server in The Cloud where the options would be the other way around, uploading all the mailshot information instead.
What the database server is not capable of doing is recognising that it could reduce the problem by getting the customers locally, transferring this list to The Cloud, running a new query on The Cloud database server that uses the list uploaded and then pass back just the results to the local database server.
Which is good because the security consequences of allowing such behaviour would be unacceptable.
As this happens in the background "Joe in marketing" may not even be aware that he is downloading huge chunks of data, slowing the company's internet connection and incurring data transfer, CPU and disk usage costs.
If you give this query to IT and get a programmer to do it, he would do what I have just said that the server can't do as it is the most efficient solution to the problem.
This step of uploading a file is always going to be a bottleneck which gets worse as the queries become more complex, turning a trivial task into one that needs thinking about.
So the ad hoc nature of using a PC is heavily compromised by either needing to set up advanced queries or run very inefficient queries that used to take a couple of seconds and now take a minute or so and are charged for. So now added to the possible cloud server costs the employee is getting frustrated, "I could do this before upgrading to the cloud".
As this is a trivial example it is easy to find ways around it, for example at 7am you could make a local copy of the customer data. Then when things go wrong and you have to work with yesterday's data which means that the report xyz gives different total to report abc because they are using different customer databases what do you do?
In highly formalised companies this may be less of an issue, the financial controller and his department may not have the interest or the necessary security permissions to query the database and all queries go via IT anyway.
The real point is did you think about this when deciding to put your data into the cloud?