Alexander Bolotin, Stéphane Mauger, Karine Malarme, Alexei Sorokin and S.Dusko Ehrlich
Génétique Microbienne , INRA, Domaine de Vilvert, 78352 Jouy en Josas cedex, FRANCE
Lactococcus lactis is an AT-rich gram positive microorganism phylogenetically close to the genus Streptococci. Different strains of L. lactis are used in milk industry mostly for production of cheese starters. L. lactis is also one of the most popular laboratory microorganisms for studies of lactic acid bacteria physiology. To promote these studies we decided to determine the sequence of the genome of L. lactis IL1403 strain. This genome has the size of 2.42 Mb, as estimated by low resolution physical mapping (Le Bourgeois et al, J. Bacteriol., 174:6752-6762, 1992). The strategy which we followed is based on generating of a limited number of plasmid clones, carrying inserts randomly distributed over the genome. Inserts were sequenced and sequences used for production of longer sequencing templates by applying Multiplex Long Accurate PCR mapping protocol (Sorokin et al, Genome research, 6:448-453, 1996 ). Alternatively, they were used as start points for sequencing by primer walking, either of longer segments of chromosomal DNA carried on phage lambda or of PCR products. Some 10000 sequencing reactions were done on 2879 plasmid clones (carrying inserts of 1-2 kb), 289 lambda-clones (carrying inserts of app.15 kb) and approx. 600 PCR fragments (ranging from 1 to 20 kb). This allowed to generate app. 2.36 Mb of the L. lactis genome nonredundant sequence, organised in 1 contig. The total redundancy of this sequence is 2 and we estimate the error rate as 1%. These data can be used for three purposes: (i) to generate detailed physical map of the L. lactis genome in terms of restriction enzyme sites, repeated elements and ORF organisation; (ii) to derive a comprehensive gene complement of L. lactis IL1403 by using BLASTx or FASTx gene prediction tools; (iii) to provide basic information for high-quality sequencing of the genome of IL1403 or other strains of L. lactis with a relatively low redundancy.
Several automatic tools, developed during this project, were applied for annotation of the genome updated concurrently with the sequencing progress. One such program, based on BLASTx algorithm, detected 1500 ORFs having best hits with proteins in databases, if the Smallest Sum Probability limit was set as 10-4. tFASTx based search gave about 1600 such ORFs. These were classified in biochemical or biological functional categories. Some of the groups of classified genes will be presented. Among the interesting findings are the following: the confirmation of the presence of at least four prophages detected earlier in the chromosome of IL1403 (Chopin et al, Appl. Env. Microbiol., 55:1769-1774, 1989); detection of about 40 copies of insertion elements, 15 of which correspond to an IS element similar to IS1070 from Leuconostoc lactis, previously unknown in L. lactis; the presence of homologs of all late competence genes characterised in B. subtilis. With many other repeated sequences and plasmids detected in L. lactis, this may give a key for understanding the mechanisms of genetic exchange in cheese starter cultures, which may influence the reproducibility of cheese production protocols. Functional exploration of several new genes of interest is under way in our laboratory.
ALSO PRESENTED IN:Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S.D., Sorokin, A. Low-redundancy sequencing of the entire Lactococcus lactis IL1403 genome. Antonie Van Leeuwenhoek, 1999, Jul-Nov; 76(1-4):27-76 Bolotin, A., Wincker, P., Mauger, S., Jaillon, O., Malarme, K., Weissenbach, J., Ehrlich, S.D., and Sorokin A. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403. Genome Research, 2001, April, 1697-0 www.genomeweb.com www.genomeweb.com