Commercial Web Application - Scalable Database Architecture

I am developing a set of web applications for tracking scientific laboratory data. Each laboratory has several members, each of whom will have access to both their own data and their laboratory as a whole. Thus, many typical queries are expected to return records of multiple members (for example, mouse, mouse, and mouse).

I think the database is normalized pretty well. Now I'm wondering how to provide efficient user access to their own data and their lab data when mixed (hopefully) with a whole host of records from other labs.

So far, I've come to the conclusion that most tables end with two fields: user_id and labgroup_id. The WHERE clause of any SELECT statement will contain a corresponding reference to one of the id fields ("... WHERE" labroup_id = n ... "or" ... WHERE user_id = n ... ").

My questions:

  • Is this an approach that will scale to 10 ^ 6 or more records?

  • If so, what is the best way to use these fields in the query so that it searches most efficiently for the corresponding subset of the database? for example, Should the first step in the query create a temporary table containing only lab group data? Or would there be indexing using some combination of id, user_id and labroup_id fields at this scale?

I thank all the respondents in advance.

+2


a source to share


1 answer


You should be more than accurate using this 10 ^ 6 row approach. We are currently using something very similar to mixed client data differentiated by an account ID with 10 ^ 8 rows and have no performance issues at all on modest hardware.

Make sure you have specific indexes that cover user_id and labgroup_id.

Remember that MySQL can only use one key per query. Take a look at your typical request pattern. If people will be using multiple columns in sentences, create composite keys that contain heavily used columns that also provide good differentiation (which means narrowing rows ... a yes / no column is a bad key, but a column with many different values. often used in a condition where a proposal might be a good candidate).



Turn on slow MySQL query log (or get a commercial query analyzer or a 30 day trial) and see which queries are taking a long time. Use the EXPLAIN command to find out which index is being used and how. If a particular query shows up in the slow query log frequently and / or with very long execution times, consider changing your indexes or adding a new one.

Make sure your my.cnf is configured correctly for your environment. The finished box configuration is almost always very poor. Here's a good guide to doing this.

+3


a source







All Articles