(A) GC content variance around CO breakpoints (blue dots and line). The window 0 on the x-axis is the GC content of the breakpoints and the negative and positive values represent the distance away from the breakpoints. Each of these windows is defined as 2 kb sequence and the GC content is calculated for each window. The red dots and line are one of the GC content random samples simulated like the numbers of CO breakpoints (blue dot and line). After 10,000 repeats, not one of random samples is as extreme as the observed (blue line) (P <0.0001). (B) Relationship between recombination and GC content. When the chromosomes are dissected into 10 kb non-overlapping regions, recombination rate (cM/Mb) and GC content can be obtained for each of them. After the bins are sorted by the GC content, the windows are divided into 31 groups based on GC content (approximately 20% to 51%, 1% interval), and the average (and s.e.m.) recombination rates reported for each group.
In both we dissect the genome into 10 kb non-overlapping windows of which there are 19,297. First, we ask about the raw correlation between GC% and cM/Mb for these windows, which as expected is positive and significant (Spearman’s rho = 0.192; P <10 -15 ). Second, we wish to know the average effect of increasing one unit in either parameter on the other. Given the noise in the data (and given that current recombination rate need not imply the ancestral recombination rate) we approach this issue using a smoothing approach. We start by rank ordering all windows by GC content and then dividing them into blocks of 1% GC range, after excluding windows with more than 10% ‘N'. The resulting plot is highly skewed by bins with very high GC (55% to 58%) as these have very few data points (Additional file 1: Figure S10E) (the same outliers likely effect the raw correlation too). Removing these three results in a more consistent trend (Additional file 1: Figure S10F). This also suggests that below circa 20% GC the recombination rate is zero (Additional file 1: Figure S10F). Removing those with GC <20% and, more generally, any bins with fewer than 100 windows (all bins with GC < 20% have fewer than 100 windows) leaves 18,680 (96.8%) of the windows, these having a GC content between approximately 20% and 51%.
Relationships ranging from recombination and you can GC-posts
Because of the observation, i guess you to typically a-1 cm/Mb increase in recombination speed was associated with a boost in GC posts of about 0.5%. On the other hand a 1% rise in GC posts corresponds to a more or less 2 cM/Mb escalation in recombination rates. We end one to because of the visible rarity out-of NCO gene conversion process, at the very least on the bee genome, extrapolation out of GC content to help you average crossing-more than price therefore seems to be justifiable, at the very least to possess GC content over 20%. I note as well one in the extreme GC information the latest recombination rate may be more than otherwise underestimated. This might reflect good discordance ranging from current and you can earlier recombination costs.
These are always construct Figure 4B, and this merchandise a somewhat noise-free (shortly after smoothing) monotonic dating among them details
Crossing-more rate is also in the nucleotide variety, gene occurrence, and copy count type places (Contour S11-S13 fabswingers from inside the A lot more file 1) . Provided the elimination of hetSNPs away from research the latter result is not trivially a good CNV related artifact. The fine-scale analyses let you know an optimistic correlation anywhere between nucleotide assortment and you can recombination rate whatsoever the bills from ten, one hundred, 200, otherwise five hundred kb series screen (Profile S11 inside the Even more document step 1). Which bolsters earlier analyses, certainly and this advertised the brand new development however, found it are non-extreme, if you find yourself another stated a pattern anywhere between people genetic rates of recombination and you can genetic variety. New pattern accords towards insight you to definitely recombination causes shorter Mountain-Robertson interference for this reason enabling significantly lower rates from hitchhiking and record alternatives, so enabling greater range. I along with select an effective negative correlation between recombination and you can gene density (Shape S12 inside Additional file 1) and you can a robust confident correlation between recombination therefore the period of multi-content places in the some window products (Shape S13 for the Even more file step one). Brand new relationship having CNVs try in keeping with a role to have non-allelic recombination creating duplications and you can deletions thru uneven crossing over .