Machine-learning based interatomic potential for amorphous carbon

Size: px
Start display at page:

Download "Machine-learning based interatomic potential for amorphous carbon"

Transcription

1 Machine-learning based ineraomic poenial for amorphous carbon Volker L. Deringer 1, 2, and Gábor Csányi 1 1 Engineering Laboraory, Universiy of Cambridge, Trumpingon Sree, Cambridge CB2 1PZ, UK 2 Deparmen of Chemisry, Universiy of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK (Daed: January 31, 17) We inroduce a Gaussian approximaion poenial (GAP) for aomisic simulaions of liquid and amorphous elemenal carbon. Based on a machine-learning represenaion of he densiy-funcional heory () poenial-energy surface, such ineraomic poenials enable maerials simulaions wih close-o accuracy bu a much lower compuaional cos. We firs deermine he maximum accuracy ha any finie-range poenial can achieve in carbon srucures; hen, using a novel hierarchical se of wo-, hree-, and many-body srucural descripors, we consruc a GAP model ha can indeed reach he arge accuracy. The poenial yields accurae energeic and srucural properies over a wide range of densiies; i also correcly capures he srucure of he liquid phases, a variance wih sae-of-he-ar empirical poenials. Exemplary applicaions of he GAP model o surfaces of diamond-like erahedral amorphous carbon (a-c) are presened, including an esimae of he amorphous maerial s surface energy and simulaions of high-emperaure surface reconsrucions ( graphiizaion ). The new ineraomic poenial appears o be promising for realisic and accurae simulaions of nanoscale amorphous carbon srucures. I. INTRODUCTION Carbon is among he mos inriguing elemens due o is srucural diversiy, and is solid-sae forms range from diamond and graphie via many more complex alloropes 1 3 onward o amorphous phases (a-c). The aomic srucures of a-c samples depend srongly on densiy and are characerized by he coexisence of hreefold ( sp 2 ) and fourfold bonded ( sp 3 ) carbon aoms. In his sense, low- and high-densiy forms of a-c are loosely reminiscen of graphie and diamond, respecively, bu he acual siuaion is much more complex (Fig. 1). Terahedral amorphous carbon (a-c), he dense, sp 3 - rich form, is of paricular echnological ineres due o is aracive mechanical properies. 4 6 Aomisic simulaions have long been providing useful insigh ino a-c maerials. Many empirical ineraomic poenials exis for carbon, from he original Tersoff 11 and Brenner 12 formulaions o more recen developmens, including an environmen-dependen ineracion poenial (EDIP), 13 improved reacive bond-order (REBO) poenials, 14,1 or a recenly re-paramerized reacive force field (ReaxFF); 16 a comprehensive comparaive sudy of such poenials was very recenly carried ou. 8 These fas poenials make large-scale moleculardynamics (MD) simulaions possible, and have been applied o engineering problems such as fracure 17 or fricion and wear of a-c coaings; 18 hey are efficien enough o perform hin-film deposiion simulaions, 19 hus direcly mirroring he aomic-scale processes in experimens. Noneheless, hese poenials remain empirical in naure, and may have serious shorcomings: prominen examples are an underesimaed concenraion of sp 3 -bonded aoms in a-c (Ref. 1) and poor descripion of surfaces. A general problem of empirical poenials is he ineviable compromise in accuracy for predicing differen maerial properies. On he oher hand, seminal sudies based on ighbinding schemes 22 as well as densiy-funcional heory () early on afforded aomisic srucure models of a-c, and more recen -MD sudies deal wih applicaions in phoovolaics 27 or coaings. 28 Furhermore, liquid carbon has been of ineres for example, in firsprinciples sudies of he diamond meling line which is difficul o evaluae experimenally. 29 Despie heir usefulness, however, -based mehods are resriced o quie small sysem sizes, and even wih he compuaional power available nowadays, hey are limied in pracice o a few hundred aoms. This makes many of he above scenarios simply inaccessible o predicive qualiy simulaions. To bridge he long-sanding gap beween hese wo realms, a novel class of simulaion mehods has recenly emerged which is based on machine learning (ML). The key idea is o map a se of aomic environmens direcly ono numerical values for energies and forces; hese quaniies are rained from a large and accurae quanummechanical reference daabase bu subsequenly inerpolaed using he ML algorihm. If raining is successful, his makes aomisic simulaions close o quanummechanical accuracy accessible bu requires less compuaional effor by many orders of magniude. Recen implemenaions use high-dimensional arificial neural neworks, 3 32 compressed sensing, 33 or Gaussian process regression. 34 Ineraomic ML-based poenials have been developed for several prooypical solids 3 39 and applied, e.g., in sudies of phase ransiions. 4 We menion in passing ha ML schemes are currenly being developed o esimae oher fundamenal properies of molecules and solids, including aomizaion energies, 41 mulipolar polarizaion, 42 band gaps, 43 or NMR parameers. 44 A recen uorial review of he field is in Ref. 4. Previous ML poenials have been creaed for he crysalline carbon alloropes diamond and graphie, 34,36 bu as hose were rained on a small region of configuraion space, hey are no suiable for simulaing a-c. In-

2 2 open srucures rich in sp/sp2 carbon 1. g cm 3 2. g cm 3 dense srucures rich in sp3 carbon 2. g cm 3 3. g cm 3 (a-c) 3. g cm 3 FIG. 1. Exemplary a-c srucures a various densiies, obained in 216-aom cells from mel quench simulaions. Noe he gradual ransiion from open o dense neworks, and he coexisence of wofold ( sp ; yellow), hreefold ( sp2 ; green), and fourfold ( sp3 ; blue) coordinaed carbon aoms. The open, low-densiy srucures are measable and on much furher annealing will form more sp2 -rich neworks;7,8 here, on purpose, we focus on he as-quenched srucures shown, o assess as diverse local environmens as possible. Bonds are drawn up o a maximum ineraomic disance of 1.8 A, and coordinaion numbers are deermined using he same cuoff. Srucures were visualized using AomEye.9 deed, he only repored ML poenial dealing wih amorphous maer is a neural-nework poenial for he phasechange daa-sorage maerial GeTe46 ha enabled largescale simulaions of hermal ranspor47 and aomisic processes during crysallizaion.48 Amorphous maerials are srucurally much more diverse han heir crysalline counerpars, and despie he lack of long-range ranslaional symmery, heir properies depend crucially on srucural order on he local and inermediae lengh scales.49 The required large uni cells and he long relaxaion imes make i very difficul o use simulaions for amorphous maerials of pracical ineres.46, The laer are hence paricularly promising arges for high-qualiy ML poenials. In his work, we inroduce an ineraomic Gaussian approximaion poenial (GAP) for condensed-phase elemenal carbon, wih paricular focus on liquid and amorphous phases of various densiies. Firs, we sysemaically deermine he maximum accuracy ha any finierange ineraomic poenial for carbon can achieve as a funcion of is neighbor cuoff, independenly of how i is fied. Then, we show ha our GAP does indeed reach his accuracy, and furhermore provides reliable srucural and opological daa ha agree well wih he compuaionally much more demanding benchmarks. Finally, we show predicions for energies and srucures of a-c surfaces, which play a key role in wear and fracure mechanisms. II. THEORY The Gaussian approximaion poenial (GAP)34,1 is an ML approach o aomisic maerials modeling, whereby an ineraomic poenial for he given maerial is rained from a daabase of reference quanum- mechanical daa, and is hen used o inerpolae energies and forces for arbirary srucures. In order o make simulaion of large sysems feasible, he oal energy is broken down ino a sum of local conribuions, given by an local energy funcion ε. This funcion is expanded in a basis se adaped o he inpu daabase; i is generaed using a kernel funcion, or similariy measure of neighbor environmens. The choice of his kernel (and he symmeries i obeys) is criical for he success of any ML poenial.2 Previous ML poenials for solids used a decomposiion ino aomic energies, and employed many-body descripors o represen he aomic neighbor environmen comprising all neighbors of an aom up o a given cuoff radius.34,3 6 However, for a complee descripion of hese aomic environmens one mus fi he aomic energy funcion in a high-dimensional space. This leads o poor exrapolaion, ha is, o a poor fi in regions of configuraion space far away from any daa poins. A long simulaion will likely find such regions especially a high emperaures, and/or when disorder is large. Indeed, in he presen case of a-c, we encounered problems early during raining when using a single many-body descripor only: MD runs driven by such GAP models showed aoms aggregaing a unreasonably small (suba ) disances. This is a very general challenge during he developmen of high-dimensional ML poenials, which carry he risk of erroneous exrapolaion behavior unless carefully esed and used. In his work, we generalize he many-body GAP approach for solids: we reain he many-body erms bu augmen hem wih wo- and hree-body descripors disances beween aoms and angles in riples. The laer erms hence represen wo- and hree-body ineracions as in radiional (empirical) ineraomic poenials, bu now all descripors and associaed local-energy con-

3 3 ribuions are par of he same ML framework. Our saring poin is hus he following expression for he oal energy: E = (δ (2b)) 2 i pairs + (δ (3b)) 2 j riples + (δ (MB)) 2 a aoms ε (2b) (q (2b) i ) ε (3b) (q (3b) j ) (1) ε (MB) (q (MB) a ) where 2b, 3b, and MB denoe wo-, hree-, and many-body ineracions, respecively. This is similar in spiri o he recenly inroduced Momen Tensor Poenials, 6 and also o anoher scheme ha uses a parameric wo-body erm in combinaion wih a neural nework ha describes he many-body ineracions. 7 In he above expression, he δ are scaling parameers, and each corresponds o he disribuion of energy conribuions ha a given ineracion erm has o represen. We choose he larges value for he 2b erms, which describe he larges share of he oal energy; on op of ha, we add a 3b erm, and finally he many-body erm wih he smalles δ (d). The local energy corresponding o each descripor d {2b, 3b, MB} is given by a linear combinaion of kernel funcions 34 ε (d) (q (d) ) = N (d) =1 α (d) K (d) (q (d), q (d) ), (2) where denoes one of N raining configuraions q, each of which aains a weighing coefficien α during fiing, and K is a covariance kernel which quanifies how similar he inpu configuraion q is o he -h raining configuraion, q. In pracice, we sparsify he represenaion and only allow he sum o range over a number of represenaive poins drawn from he full raining daabase (N N full ). The number of represenaive poins differs for each descripor and mus be carefully conrolled during raining. Boh for 2b and 3b conribuions, we use a squared exponenial kernel, 34 K (d) ( q (d) i ), q (d) = exp 1 2 ξ (q (d) ξ,i q(d) θ 2 ξ ξ, )2, (3) where ξ is an index running over he componens of he descripor vecor q (d). In he case of pairs, he descripor has one single scalar componen (namely, he disance r 12 beween he wo aoms involved): q (2b) = r 2 r 1 r 12 ; (4) for riples, we do no direcly use he naural coordinaes r 12, r 13, and r 23, bu a differen form o enforce symmery over permuaion of he neighbor aoms 2 and 3: 1 r 12 + r 13 q (3b) = (r 12 r 13 ) 2. () r 23 Noe ha wih his choice of descripors, he firs erm in Eq. 1 is equivalen o a pair poenial, and he second is a generic hree-body poenial, bu in he GAP framework boh do no impose consrains on he specific funcional form. For he many-body erm, we use he recenly inroduced Smooh Overlap of Aomic Posiions (SOAP) 2 descripor, which has proven successful in generaing GAP models for ungsen, 39 in classifying diverse molecular and solid-sae srucures, 8 and very recenly in consraining srucural refinemens of amorphous Si. 9 We briefly review he mos perinen feaures; deailed formulae and derivaions are in Ref. 2. SOAP sars from he neighborhood densiy of a given aom a, defined as ρ a (r) = exp [ (r r ab) 2 ] 2σ 2 f cu (r ab ), (6) b a where he sum is over neighboring aoms, and he cuoff funcion f cu, which ensures compac suppor, goes smoohly o zero a r cu over a characerisic widh r. The parameer σ a ulimaely conrols he smoohness of he poenial. The neighbor densiy is expanded ino a local basis of orhogonal radial basis funcions g n and spherical harmonics Y lm, ρ a (r) = nlm c (a) nlm g n(r)y lm (ˆr), (7) and he expansion coefficiens are used o form he spherical power specrum, p (a) nn l = 8π 2 2l + 1 m ( ) c (a) (a) nlm c n lm, (8) which is invarian boh o permuaions over neighbors and o 3D roaions of he neighbor environmen. We use he elemens of a finie runcaion of he power specrum (up o n n max and l l max ) as componens of he many-body descripor vecor q (MB) a, which furhermore is normalized o have uni lengh. The kernel funcion for he SOAP erm is he simple do produc, k(q (MB) a, q (MB) ) = p (a) nn l p() nn l = q(mb) a q (MB), (9) nn l and we find i advanageous o raise i o a small ineger power for a sharper disincion beween differen environmens. This gives he final kernel K (MB) (q (MB) a, q (MB) ) = q (MB) a q (MB) ζ. ()

4 4 This do produc kernel is a naural choice o use wih he power specrum descripor, as i makes he kernel equivalen (up o normalizaion) o he inegraed overlap of he original neighbor densiies, d ˆR ρ a (r)ρ ( ˆRr) 2. (11) The expression for he oal energy in our GAP model is herefore given by E = (δ (2b)) 2 ( ) K (2b) q (2b), q (2b) + (δ (3b)) 2 + (δ (MB)) 2 i j a α (2b) α (3b) α (MB) i ( ) K (3b) q (3b), q (3b) j K (MB) ( q (MB) a, q (MB) (12) where all fiing coefficiens α ener linearly, and herefore we can obain hem simply using linear algebra. This is in conras wih he difficul nonlinear parameer opimizaion required boh for radiional ineraomic poenials and for some oher ML schemes, e.g., arificial neural neworks. The above discussion does no include he prescripion for obaining he linear fiing coefficiens. In pracice, his is complicaed due o he fac ha he quanummechanical daa is only available in he form of oal energies, aomic forces, and virial sresses. The full formalism simulaneously includes sparsificaion, muliple energy erms, and fiing o oal energies and heir derivaives; i is given elsewhere. 1 To illusrae he role of he combined descripors, we use differen (and increasingly complex) GAP models o compue he poenial-energy curve for an isolaed carbon dimer; hese models have been fied o he full bulk and surface raining se described below ha addiionally incorporaes daa poins beween.8 and 3.7 Å in small incremens. The resuls are summarized in Fig. 2: GAP models using 2b descripors only, or a combinaion of 2b+3b, reproduce he minimum and he repulsion a small C C disances reasonably well, bu he longerrange behavior is no ye correcly described. An ineresing resul is seen when using a many-body descripor only: he fi is very good for he region where daa poins (blue circles) are provided, bu shows unphysical behavior a r <.8 Å; his can, and will, hen lead o bad exrapolaion in pracical simulaions. By conras, a GAP model combining all hree descripors (Eq. 12) gives a highly saisfacory resul (red line in Fig. 2). III. COMPUTATIONAL METHODS ), A. General proocol for mel quench simulaions Energy (ev) 3 2b only 2b+3b only SOAP only 2b+3b+SOAP C C disance (Å) FIG. 2. Poenial-energy scans for an isolaed carbon dimer. This plo, wih daa as reference (blue), allows us o assess he use of differen srucural descripors: all hree combined are needed for a high-qualiy fi (see ex). Iniial simulaions were performed in he framework, subsequen ones wih GAP, bu boh employed he same emperaure proocol. For each simulaion, an (unsable) simple-cubic laice of carbon aoms was generaed a he appropriae densiy and held a a consan emperaure of 9 K for 3 ps. The simulaion cell was hen held in he liquid sae a K (3 ps), quenched wih an exponenially decaying emperaure profile (. ps), and finally annealed a 3 K (3 ps). The imesep was 1 fs in all MD simulaions. B. -based ( ab iniio ) molecular dynamics Srucures for iniial raining, as well as benchmarks for a-c properies, were generaed using -based ab iniio MD, using he Quicksep scheme and a sochasic Langevin hermosa 6 as implemened in cp2k. 61,62 Elecronic wavefuncions were described a he Γ poin using a mixed-basis scheme wih Goedecker Teer Huer pseudopoenials 63 and a cuoff energy of 2 Ry. Double-ζ qualiy basis funcions were used for he carbon 2s and 2p levels. Exchange and correlaion were reaed in he local densiy approximaion (LDA), 64 boh during ab iniio MD and raining-daa generaion. This funcional, despie is simpliciy, has long been used for aomisic simulaions of a-c and is sill he de faco sandard for many curren applicaions. 1,27,28 Furher work may be concerned wih he applicaion of higher-level mehods, such as compuaionally much more expensive hybrid funcionals, or he implemenaion of dispersion correcions hese will likely be ineresing addiions o he GAP framework, bu are beyond he scope of he presen sudy. Srucural daa were obained from mel quench MD, following proocols ha are well esablished for a-c. 23,2

5 C. Consrucion of he raining daabase Our raining daabase conains srucural snapshos from ab iniio MD and also, as i is ieraively exended, from GAP-driven simulaions. No maer how generaed, all srucures are hen subjeced o single-poin -LDA compuaions o yield well-converged energies and forces for raining. This was done using CASTEP, 6 wih dense reciprocal-space meshes (maximum spacing.3 Å 1 ), 66 a 6 ev cuoff for plane-wave expansions, and an exrapolaion scheme o counerac finie-basis errors. 67 Gaussian smearing of.1 ev widh was applied o elecronic levels. The haling crierion for SCF ieraions was E < 8 ev. Iniial raining daa were compued for snapshos from ab iniio MD mel quench rajecories, and a preliminary GAP was fied o hose daa. The resuling poenial reproduced he srucure of he 9 K liquid well, ha of he K liquid saisfacorily, bu no ye ha of he amorphous phase. In rerospec, his is easily undersood: he 9 K liquid is highly diffusive, and so one single 3 ps rajecory apparenly conains sufficienly differen aomic environmens o sample configuraion space during raining. A quenched amorphous srucure, by conras, is essenially one single snapsho wih hermal flucuaions bu no major changes in conneciviy. Training from daa alone would hus incur significan expense, as each uncorrelaed a-c sample would require a full mel quench rajecory (9 seps) of which only he las snapsho were of use. Insead, an iniial GAP was used o generae liquid srucures a K, which were hen briefly reequilibraed ( seps) and quenched ( seps) using ab iniio MD. This was done for en uncorrelaed srucures each a 2., 3., 3.2, and 3. g cm 3, hus placing more emphasis on high-densiy amorphous phases which are richer in erahedral ( sp 3 ) moifs and hus srucurally mos differen from he liquids. The resuling, amended daabase was used o rain a new GAP, which was furher exended ieraively by performing mel quench simulaions fully driven by he previous GAP version, as is common pracice in he developmen of ML poenials. 39,46 Thereby, all GAP-MD simulaions were carried ou using a Langevin hermosa as implemened in quippy ( and he same emperaure profiles as in he cp2k simulaions. A ypical proocol included he generaion of independen srucures a densiies of cm 3, wih sysem sizes of aoms. For one or more snapshos from each rajecory, a single-poin compuaion was performed and he resuls were included in he nex round of raining. To add amorphous surfaces o he raining se, we generaed a-c srucures using GAP, and from hese creaed slabs by adding vacuum regions. In parallel ab iniio MD runs, amorphous slabs were briefly heaed a up o K, and srucures from boh procedures were added o he daabase. We reierae ha i is no TABLE I. Key parameers for he GAP model creaed in his work (see Sec. II for definiions). 2-body 3-body SOAP δ (ev). a.3 a.1 r cu (Å) r (Å). σ a (Å). n max, l max 8 ζ 4 Sparsificaion Uniform Uniform CUR N (a-c bulk) 12 2 N (a-c surfaces) N (crysalline) 2 N (dimer) 3 N (oal) 1 43 a For he 2b and 3b descripors, when specifying raining inpu, he δ given here is divided by he expeced number of pairs or riples an aom is involved in. problemaic o generae he raining srucures wih differen echniques, 68 as heir energies and forces are recompued using he same reference mehod (ighly converged plane-wave ). Once he raining daabase of liquid and amorphous srucures had been generaed, i was furher exended by including randomly disored uni cells of he crysalline alloropes, diamond and graphie. This combined daabase was hen spli ino a raining and a es se (9:); he laer was no included in he fi bu used for validaion. Finally, daa for an isolaed dimer were added (cf. Fig. 2). A full descripion of he raining daabase is provided as Supplemenary Informaion. D. GAP model fiing Values for he above GAP parameers as used in he presen work are given in Table I. Furhermore, he regularizaion parameers of he Gaussian process corresponding o he expeced errors were as follows. For liquid and amorphous srucures we se.2 ev (energies) and.2 ev Å 1 (forces); for he crysalline forms, we muliplied boh values by.1, and addiionally included virials in he raining wih an expeced error of.2 ev. Sparsificaion was done wih he CUR mehod 69 for he SOAP kernel, whereas a simple uniform grid of basis funcion locaions was used for he 2b and 3b erms. In he following, unless specified oherwise, GAP model refers o one wih all hree erms (2b, 3b, and SOAP). The poenial files are freely available a hp://

6 6 a b (r fix ) (ev Å 1 ) r fix Define fixed sphere around one aom Diamond Random disorion (.1 Å) 3.7 c r fix Disor aoms ouside r fix (several copies) (r fix ) (ev Å 1 ) { F (i) } cener (r fix ) Esimae localiy Graphie Random disorion (.1 Å) r fix (Å) r fix (Å) FIG. 3. (a) Schemaic overview of he procedure used here for localiy ess. (b,c) Resuls for diamond and graphie, respecively, obained by displacing all aoms ouside r fix randomly and inspecing he sandard deviaion of he force on he cenral aom. IV. RESULTS AND DISCUSSION A. Localiy and arge accuracy A cenral assumpion of all ineraomic poenials is ha of localiy: he energy associaed wih a given aom or bond depends on is immediae environmen (ε i ε(q i )), bu no on aoms furher away han a given cuoff radius (ignoring elecrosaic erms and van der Waals correcions for he momen). A similar assumpion follows direcly for he forces on aoms. While his approximaion is ofen made implicily, and acily, in he developmen of empirical poenials, heir ML-based counerpars aim a quaniaive energy and force accuracy wih respec o he reference poenial-energy surface, and so a he ouse we mus numerically deermine how well he above assumpion holds. This quesion is very general, and likely relevan beyond he presen sudy. Quanum-mechanical models such as are inherenly nonlocal: hey do no allow for a unique pariioning of he oal energy ino a sum of local erms. Noneheless, quanum models of elecronic srucure are nearsighed, 7 which means ha he reduced one-paricle densiy marix decays srongly (a leas under he assumpion of screening, for insulaors, and in general a finie emperaure). 71 This implies localiy in he aomic forces, which we quanify as he decay of he dependence of an aomic force on a neighboring aom s posiion as he disance beween he wo aoms grows. A direc manifesaion of his is he decay of he dynamical marix or a a-c (2. g cm 3 ) a-c (3. g cm 3 ) 1.2 Random disorion Random disorion (.1 Å) 1.2 (.1 Å).9.9 (r fix ) (ev Å 1 ) b a-c (2. g cm 3 ) a-c (3. g cm 3 ) 1.2 MD-induced disorion 1.2 MD-induced disorion.9.9 (r fix ) (ev Å 1 ) r fix (Å) r fix (Å) FIG. 4. Localiy ess for a-c srucures. (a) Force localiy in low- and high-densiy forms, evaluaed in he presence of a small disorion ha preserves he major opological feaures of he amorphous nework. In boh panels, daa have been colleced over hree srucural models; for each, en aoms were randomly sampled as sphere ceners, and five independen disorions were creaed for each cenral aom. (b) Same bu for a large disorion induced by MD a very high emperaure such as o erase any srucural memory of he iniial cell ouside he fixed sphere. Hessian. Using our reference quanum-mechanical mehod, we can calculae he above decay of he dependence of he aomic forces, and hus deermine a bound on force localiy. Conversely, his gives a bound on he force accuracy of any ineraomic poenial model based on a local energy decomposiion. We sress again ha all his applies only for maerials wihou srong polar ineracions or for models from which such polar ineracions have been subsanially subraced. The procedure, as previously employed by Barók e al., 34 is skeched in Fig. 3a: we selec one aom in a given simulaion cell and define a sphere of radius r fix around his cener in ha he aoms are fixed. We hen creae many srucures which differ in he posiions of he aoms ouside he fixed sphere, and for each calculae he force on he cenral aom using. 72 Localiy is hen characerized by ploing he sandard deviaion of his force as a funcion of r fix. We firs consider he crysalline alloropes and begin by inroducing raher modes disorions, moving all aoms ouside r fix randomly wih a sandard deviaion of.1 Å. Diamond exhibis srong localiy (Fig. 3b): he overall force deviaions due o displacemens ouside he spheres are small, and hey gradually vanish and are pracically zero a r fix =. Å. Graphie, by sark conras, is highly non-local (Fig. 3c): he force deviaions

7 7 B. Energies and forces Wih he arge errors for a finie-range poenial esablished, we can now analyze he qualiy of our GAP. We herefore es how much he prediced energies and forces deviae from reference values. Again, we assess differen combinaions of srucural descripors, and hereby illusrae how hierarchical GAP models can achieve increasing accuracy. Correlaion plos of energies and force componens already make his clear (Fig. a): using he 2b descripor only, here is a cerain degree of correlaion beween he and GAP energies, bu wih much scaer, and here is essenially no correlaion beween and GAP force componens (ligh gray). A 2b+3b model is clearly beer (dark gray), bu ulimaely SOAP mus be added (red) o achieve he accuracy limi imposed by nonlocaliy (Fig. b). Figure c shows he errors as cumulaive disribuions: he curves move lef (oward lower errors) and up (o a higher degree of confidence) as successively more complex descripors GAP predicion a Toal energies 6 Force componens 2b only 2b+3b 2b+3b+MB GAP error b force (ev Å 1) energy (ev aom 1) c 3 energy (ev aom 1) Cumulaive (%) are much larger han in diamond, and hey do no decay as rapidly. Turning now o he localiy in amorphous carbon, we focus on wo represenaive densiies: a low-densiy form (2. g cm 3 ) and an approximan of dense a-c (3. g cm 3 ), and again we sar by randomly displacing aoms (Fig. 4a). Qualiaively, he resuls are in line wih hose for he crysalline phases: he more sp2 -rich form (2. g cm 3 ) clearly shows lower localiy.73 Due o he coexisence of sp2 /sp3 moifs in he amorphous forms, however, here is no clear-cu disincion beween he wo sysem sizes, and a-c reains a noable degree of nonlocaliy. The displacemens so far have perurbed he aoms ouside rfix, bu he models sill reain a memory of he iniial srucure even ouside he fixed sphere. We herefore nex perform Tersoff MD,11 saring wih velociies ha correspond o a very high emperaure, and le he sysem evolve for 1 ps, again keeping he cenral sphere fixed. This leads o a more local picure (Fig. 4b), especially for he dense diamond-like form; noneheless, he overall σ(3.7 A ) values in he laer are much larger han in he crysalline form. Summarizing, diamond shows he srong force localiy expeced for a covalen semiconducor; graphie, by conras, is highly nonlocal. The laer holds for he amorphous phases as well, more pronounced so a low densiy. As a ballpark measure, for an a-c poenial wih a cuoff radius of 3.7 A, we esimae he lowes achievable sandard deviaion of force componens o be 1 ev A 1 (Fig. 4). One migh increase rcu up o 7. A, which is expeced o lower he sandard deviaion o.7 ev A 1, bu he radeoff in erms of much greaer compuaional expense (boh during raining and applicaion of he GAP) does no seem o jusify his. Hence, all ha follows will be done in he framework of modes rcu values as given in Table I. σloc force (ev Å 1) σloc 8 6 2b only 2b+3b 2b+3b+MB % Energy error (ev aom ) Force error (ev Å 1) FIG.. (a) Scaerplos of -compued and GAPprediced oal energies (lef; relaive o a free single aom) and force componens (righ) on a es se of 4 configuraions. Resuls are shown for hierarchical GAP models wih differen combinaions of descripors (Sec. II). (b) Absolue errors of he respecive quaniies, similarly resolved according o differen ses of descripors. For he force componens (righ), akin o Ref. 34, an esimae of he maximum achievable sandard deviaion as judged from localiy ess (Sec. IV A) is indicaed by a blue line. (c) Cumulaive disribuions: a given poin (x, y) on he curve indicaes ha y percen of all srucures have an error equal o or below x. The sandard deviaion esimaed from localiy ess, σloc, which should enclose 68.3% of he GAP force componen errors, is indicaed in blue: indeed, he GAP model wih combined 2b, 3b, and MB descripors (red line) does reach his accuracy. are added o he GAP model. For such a heerogenous raining daabase, i is ineresing o furher break down he GAP s performance according o configuraion ypes: slighly disored diamond configraions will be easier for an ML poenial o fi han disordered liquid carbon. Indeed, looking back a Fig. shows ha he raining poins wih lowes overall energy show he lowes fiing errors; hese are precisely he crysalline srucures. In Fig. 6, we show he disribuion of errors for configuraions coming from differen sages of he mel quench

8 8 GAP force componen (ev Å 1 ) 3 3 Liquid (9 K) Liquid ( K) Quench ( 3 K) Amorphous (3 K) TABLE II. Energy and force RMS errors of our GAP, compued for a se of 12-aom srucures (cf. Fig. 6), and also for he crysalline srucures from he es se. Percenile values P 9 for he absolue values are given; hese measure he range of daa he GAP has o learn. Energy Force componens (ev) (ev Å 1 ) RMS RMS P 9 (GAP) (GAP) () Raio Liquid (9 K) Liquid ( K) Quench Amorphous Crysalline force componen (ev Å 1 ) FIG. 6. -compued versus GAP-prediced force componens in a se of 12-aom snapshos of liquid and amorphous carbon, emphasizing he overall magniude of forces he GAP has o learn a various pars of he mel quench rajecories. cycles. We invesigae a se of uncorrelaed a-c srucures, wih 12 aoms each and randomized densiies over he range g cm 3, creaed using GAP-MD and subsequenly analyzed using. From each melquench rajecory, we ake one configuraion a each key sep ha is, one from he 9 K and one from he K liquid, one during quenching, and he final one from he quenched amorphous sample. The force errors are very similar for all pars of he rajecory, bu he absolue magniude of forces is much differen; hence, in relaive erms, he GAP performs much beer for forces in he liquid han in he amorphous phases. A deailed numerical analysis is in Table II. We esimae how widely he absolue force componens are disribued by giving heir 9-h percenile value P 9. We hen divide he GAP force componen error by P 9 ; he lower his raio, he beer. For mel quench simulaions, he siuaion appears favorable: as he srucure is frozen in during quenching, he opology (say, he sp 3 coun) of he amorphous phase is largely deermined by a correc descripion of he liquid. C. Srucural properies From energy and force evaluaions, we now move on o probe physical properies as prediced by our GAP. Table III compares is performance o reference daa for he diamond srucure. Here and in he following, we will also make comparison o a sae-of-he-ar empirical poenial, namely, a screened varian of he Tersoff po- TABLE III. Srucural and elasic properies of diamond, compued using -LDA and our GAP as well as he screened Tersoff poenial from Ref. 74 ( scrt ). GAP scrt a (Å) B Voig (GPa) C 11 (GPa) C 12 (GPa) C 44 (GPa) enial developed by Pasewka and coworkers. 1,74 Similar poenials have been successfully applied in recen sudies boh o graphene 7 and o a-c, 76 and are faser han GAP by abou a facor of. The laice parameer a of diamond is accuraely reproduced by he GAP; he bulk modulus and elasic consans are reasonable bu deviae somewha from he reference (Table III). I was shown previously ha a GAP model rained for he crysalline phases exclusively can reproduce he benchmark even beer; 34 here, he gain in ransferabiliy (being able o model amorphous as well as crysalline phases) comes a a small price in erms of accuracy. Similar ess for he graphie c parameer gave 6.62 Å () and 6.18 Å (GAP). Despie his sligh overbinding ( 1.6%), he agreemen is appreciable, especially given ha he Tersoff and Brenner poenials are shorranged and canno describe he inerlayer spacing in graphie a all (r cu < c/2). We now urn o he main subjec of he presen work: he liquid and amorphous phases of carbon. We begin by inspecing he concenraion of sp 3 aoms during mel quench simulaions (Fig. 7), and use his o once more assess he performance of differen combined srucural descripors. The reference (blue) shows ha in liquid carbon a 3. g cm 3, approximaely one-hird of he aoms are in fourfold coordinaion, and his number increases srongly when quenching (6. 6. ps in simula-

9 9 T (K) sp 3 aoms in a-c (%) Randomize (3 ps) Mel (3 ps) 2b+3b+MB 2b+3b 2b only Simulaion ime (ps) Quench Anneal (3 ps) FIG. 7. Top: Exemplary emperaure profile during a - MD mel quench simulaion o yield a 216-aom srucure of a-c. Boom: Concenraion of sp 3 aoms during hese cycles, measured by couning aomic neighbors up o a cuoff disance of 1.8 Å. benchmarks are compared o GAP resuls using differen combinaions of descripors; areas of ligh shading indicae sandard deviaions. ion ime). During annealing a 3 K, he average hen levels off a 7 %; as only hree srucures were creaed wih, flucuaions and sandard deviaion (ligh blue shading) are sizeable. The GAP resuls, by comparison, clearly idenify he need for combined srucural descripors when aiming o make physically meaningful predicions: using he wo- and hree-body descripors only yields sysemaically oo low sp 3 concenraions (gray), whereas boh combined wih SOAP essenially reproduce he daa for he liquid forms (red); he sp 3 coun is sill underesimaed in he quenched amorphous phase. We performed addiional GAP simulaions in which we increased he quenching ime from. o 2. ps, bu his did no furher improve he resul. For compleeness, we include in Fig. 7 resuls for a GAP model ha employs wo-body descripors only bu in his case he aoms clump ino unphysical srucures wihin he firs few seps (black line), no unexpecedly so. The simples measure of shor-range order in a liquid or amorphous srucure is given by he radial disribuion funcion (RDF). In Fig. 8, we compare GAP resuls o hose of, and sar by noing ha boh are very close. The liquid srucures are more diffuse and less srongly ordered, and he RDFs show a nonzero firs minimum a 1.9 Å, whereas he amorphous srucures exhibi a gap beween heir firs and second RDF peak. A small bu visible asymmery of he second RDF peak a 2 Å for all amorphous srucures indicaes he 1. g cm 3 2. g cm 3 2. g cm 3 3. g cm 3 3. g cm 3 Liquid ( K) GAP scrt Disance (Å) Amorphous (3 K) GAP scrt Disance (Å) FIG. 8. Radial disribuion funcions for liquid (lef) and subsequenly quenched amorphous (righ) carbon srucures (en independen 216-aom srucures were creaed a each densiy). Resuls for five densiies are given, spanning he enire range visualized in Fig. 1. scrt denoes he screened Tersoff poenial as inroduced in Ref. 74. presence of fourfold rings. The screened Tersoff poenial ( scrt ) does no predic he exisence of such fourfold rings in a-c (we will reurn o his below), and oher han and GAP i lowers he firs RDF minimum o almos zero in all liquid srucures. Figure 9 shows angular disribuion funcions (ADFs). The ADF maxima a low (high) densiy are cenered around 1 (9 ), respecively, loosely mirroring he defining srucural feaures of he crysalline alloropes (graphie honeycombs and diamond erahedra); naurally, his disribuion is broader in he highly diffuse liquids han in he quenched amorphous srucures. A low densiies, a conribuion close o 18 is seen in he reference daa, due o nearly linearly coordinaed sp carbon aoms (yellow in Fig. 1); his is a minor feaure a 2. and 2. g cm 3, bu becomes prominen a 1. g cm 3, especially so in he quenched amorphous srucures (op righ panel in Fig. 9). The GAP reproduces hese feaures very well, boh he locaion and he exen of he maxima, as well as he overall shape of he ADF curves. The screened Tersoff poenial deviaes significanly from he and GAP resuls, and he ADFs derived from i are zero boh a 6 (absence of hree-

10 1. g cm 3 2. g cm 3 Liquid ( K) GAP scrt 1 Amorphous (3 K) GAP scrt 1 sp 3 coun (%) Exp. Brenner Tersoff scrt GAP 2. g cm Densiy (g cm 3 ) 3. g cm 3 3. g cm FIG.. Coun of sp 3 (fourfold coordinaed) carbon aoms in quenched a-c srucures as a funcion of densiy. Ten independen mel quench cycles were performed a each densiy for he empirical and GAP models; hree independen ones were done for. Daa for he Brenner poenial are aken from Ref. 1. Experimenal daa have been colleced from Refs Error bars represen sandard deviaions. Lines beween daa poins are only guides o he eye Angle (deg) Angle (deg) FIG. 9. As Fig. 8, bu for angular disribuion funcions. Thin verical lines mark angles of 9. and 1, corresponding o ideal sp 3 (diamond-like) and sp 2 (graphie-like) moifs, respecively. The scaling of he verical axes is arbirary. fold rings) and a 18 (absence of linear sp-bonded chains). D. Coordinaion saisics and medium-range order Among he key srucural characerisics for a-c is he concenraion of fourfold coordinaed ( sp 3 ) aoms as funcion of he sample densiy. We assess his in Fig., comparing GAP resuls o bu also o previous modeling and experimenal sudies. The empirical Brenner and Tersoff poenials, as is known, 1 underesimae he sp 3 coun a high densiy; indeed, one of he breakhrough successes of he screened Brenner and Tersoff poenials has been heir much improved descripion of a-c in his respec. 1 In comparison, he GAP daa (red in Fig. ) are even closer o he reference (blue), paricularly a lower densiies. The residual error of he GAP resuls is mos pronounced a 3. g cm 3, and so using his densiy for he example in Fig. 7 showed he wors of all cases. Looking beyond he firs neares-neighbor shell, he medium-range order in amorphous maerials is convenionally characerized by means of ring saisics, which we evaluae using Franzblau s shores-pah algorihm. 8 Again, we compare liquid and quenched amorphous srucures side-by-side, and inspec he enire range from low o high densiies (Fig. 11). The reference (blue) shows ha he disribuion is quie complex: a high densiies, he ring sizes cener around sixfold (similar o diamond, where n = 6 exclusively), and he disribuion decays quickly beyond ha; no large-membered rings are found in a-c. By conras, he disribuion in he low-densiy srucures is less clearly defined and does involve higher-order rings, indicaive of srucural voids. The resuls for a-c a 3 K are very similar wih all hree mehods. In addiion, he GAP model also recovers he hree- and four-membered rings ha are key feaures of he liquid and also prominen in low-densiy amorphous srucures. 23 The screened Tersoff poenial, by conras, overesimaes he average ring size in lowdensiy a-c, and does no predic he occurrence of any hree- or fourfold rings, neiher in he liquid nor in he amorphous phases. So far, all validaion of he new poenial has been done agains, and herefore necessarily been limied o raher modes sysem sizes of 216 aoms. The rue srengh of ML poenials, however, is in he applicaion o larger srucures. Figure 12 shows resuls for an 8,- aom a-c srucure a 1. g cm 3, which would presenly be impossible o generae wih -based MD even on sae-of-he-ar supercompuers. RDFs and ADFs obained wih 216-aom and 8,- aom srucures are pracically he same and are hence no shown. The siuaion for he ring saisics (Fig. 12b) is more complex. For small- and medium-sized rings, resuls for he large sysem (purple) come very close o

11 a 47.4 Å Liquid ( K) Amorphous (3 K) 3. g cm 3 2. g cm 3 2. g cm 3 1. g cm 3 scrt GAP scrt GAP Coun (1. g cm 3 ) Coun (3. g cm 3 ) b c 3 6 scrt, 8 aoms GAP, 8 aoms scrt, 216 aoms GAP, 216 aoms scrt, 8 aoms GAP, 8 aoms scrt, 216 aoms GAP, 216 aoms 3. g cm n-membered rings n-membered rings FIG. 11. Medium-range order in a-c as evaluaed hrough ring saisics. Top: Srucural fragmen from one of he generaed a-c srucures a 2. g cm 3, chosen o visualize he diversiy of ring sizes observed. Rings are indicaed by shading, and heir size n is given. Boom: Ring saisics for liquid (lef) and quenched amorphous (righ) carbon srucures obained from (blue), GAP (red), and screened Tersoff poenial ( scrt ; black) simulaions. Daa for he liquid srucures have been colleced over he las 1 ps of he respecive rajecory; daa for he amorphous srucures correspond o he las snapsho only, as he srucures are srongly correlaed in his case. For GAP-derived srucures, he sandard deviaion for he coun of each ring size is indicaed by error bars. he average from he 216-aom srucures (red). Hence, while a single 216-aom snapsho will no be sufficien o invesigae ring saisics of a-c models, one may insead collec averages over sufficienly many smaller srucures, and herefore reproduce he shor- and medium-range srucural feaures wihou requiring larger simulaion cells. Noneheless, here is an inheren deviaion beween he 216- and 8-aom srucures: namely, a very large n-membered rings FIG. 12. Applicaion of he GAP o larger-scale simulaions. (a) Mel-quenched 8,-aom srucure of a-c a 1. g cm 3, shown as sick drawing. (b) Ring saisics for his srucure (purple) and as averaged over differen 216-aom srucures (red; as in Fig. 11). Purple shading emphasizes ring sizes of n 18 ha he smaller sysems canno reproduce. (c) Same analysis bu for a-c (3. g cm 3 ). ring sizes (n 18; shaded in Fig. 12b), which he 216- aom cells canno reproduce as hey are simply oo small. This emphasizes ha realisic sudies of voids and porosiy in a-c will require large srucures on he order of a leas several housand aoms. We also creaed an 8,-aom a-c srucure (Fig. 12c): in his case, no voids are found bu a dense, diamond-like nework. Consequenly, no large rings (n > 1) are observed, and he 216-aom simulaions already provide a very good esimae of he medium-range srucural order. Likewise, he screened Tersoff poenial here correcly reproduces he maximum a n = 6 as well as he abundance of larger-membered rings. The laer drops o zero beween n = 12 and n = 1 for all poenials and sysem sizes invesigaed.

12 12 Young's Modulus (GPa) Exp. (Ref. 81) Exp. (Ref. 82) Exp. (Ref. 83) scrt (on srucures) GAP (on srucures) scrt (on scrt srucures) GAP (on GAP srucures) a 18.8 Å b γ unrel (J m 2 ) GAP screened Tersoff Slab srucure Densiy (g cm 3 ) FIG. 13. Young s modulus of a-c as a funcion of densiy. Experimenal values are aken from Ref. 81 (green), Ref. 82 (blue), and Ref. 83 (yellow), respecively. Lines beween daa poins are guides o he eye. E. Elasic properies We nex evaluaed he Young s modulus of a-c which, like he sp 3 coun, depends srongly on densiy We compare resuls using scrt and GAP, bu no, due o he high expense of fully relaxing he inernal degrees of freedom for several uncorrelaed models. In addiion, we disenangle he effec of inpu srucure versus poenial performance for he predicion of elasic properies, and herefore also use boh scrt and GAP o evaluae he Young s moduli of -generaed srucures. To compue he Young s modulus of a-c, we ake previously generaed 216-aom srucures, perform furher shor MD quenches from 3 K o very low emperaure, and finally a conjugae-gradien relaxaion o minimize he forces on aoms; he cell vecors remain fixed o keep he densiy unchanged. For each opimized srucure, we compue he full 6 6 marix of elasic consans C wihou imposing symmery operaions, and inver his marix o obain he compliance marix S. 84 From his, we calculae he Young s modulus E (see, e.g., Ref. 8) by averaging over he hree spaial direcions: E = 1 [ ] (13) 3 S 11 S 22 S 33 and subsequenly over independen srucures (en from scrt and GAP mel quench runs, hree for he case; see above). The GAP resuls agree very well wih experimens a all relevan densiies (Fig. 13), and as expeced hey predic increased siffness as densiy and sp 3 concenraion ( diamond-likeness ) increase. The screened Tersoff poenial correcly capures he same rend, albei he absolue values are significanly overesimaed; his is mos pronounced a higher densiies. c d T (K) sp 2 aoms (%) hea anneal quench Simulaion ime (ps) FIG. 14. (a) Exemplary surface slab of a-c, freshly cleaved from a -aom bulk srucure. (b) Unrelaxed surface energies (Eq. 14) for five slabs cleaved from he same bulk srucure. Lines beween daa poins are guides o he eye. (c) Course of emperaures in he proocol we use o generae reconsruced surfaces: he sysems are heaed over ps o (green), (yellow), or 3 K (red), respecively, and annealed a his emperaure for anoher ps. The final ps hen consiue a slower cooling back o 3 K. (d) Concenraion of sp 2 carbon aoms in -aom slabs versus simulaion ime. Averages over en independen srucures are given, and areas of ligh shading indicae sandard deviaions. F. From he bulk o surfaces Realisic maerials modeling, especially a he nanoscale, mus exend from he bulk o a descripion of crysal surfaces and heir reaciviy. 86 Likewise, he surfaces of amorphous maer are of broad ineres bu pose paricular and significan challenges for modeling. We here presen iniial applicaions of our GAP o amorphous carbon surfaces of he 3. g cm 3 phase (a-c). This is because dense, diamond-like carbon is used in coaings 6 and i is his form for which surface phenomena are mos relevan. Early sudies of a-c surfaces have been repored a he level bu have necessarily been resriced o very small sysem sizes. 87 Larger-scale simulaions were made possible by igh-binding schemes 88 and EDIP, 89

13 13 a Unrelaxed surface b Annealed a K c Annealed a K d Annealed a 3 K sp 2 sp 3 sp 2 sp 2 sp2 sp 3 sp 3 sp 3 sp 3 sp sp 2 / sp 3 coun sp 2 / sp 3 coun sp 2 / sp 3 coun sp 2 / sp 3 coun Disance from cener (Å) FIG. 1. Surfaces of a-c before (a) and afer annealing a differen emperaures (b d) as prediced from GAP simulaions. Each panel shows one exemplary slab srucure on he lef, wih sp 2 ( graphie-like ) aoms drawn as small dark spheres, and sp 3 ( diamond-like ) aoms larger and ligh gray. Noe how annealing a 3 K leads o beginning deachmen of graphiic-like shees, and disinegraion of he a-c regions even in he cener of he slab. Hisograms for he sp 2 /sp 3 disribuion in direcion normal o he surface have been colleced over en independen srucures; he coun is normalized per simulaion cell. bu even high-qualiy empirical poenials may face problems when i comes o he predicion of surface energies; his has already been repored for diamond. 74 Convenionally, he surface energy, γ, is calculaed as γ = 1 2A [E slab N E bulk ] (14) for an elemenal (or soichiomerically precise) surface slab ha conains N aoms and exposes equivalen surface areas A a op and boom; in his expression, E slab denoes compued oal energies for a slab model per uni cell, and E bulk refers o he energy of he underlying bulk srucure per aom. For amorphous sysems, he srucure of he surface is no uniquely defined (here are no disinc cleavage planes as in crysals), and o calculae γ one mus average over many large srucures. We assess he suiabiliy of he GAP model for such sudies by compuing surface energies of a-c and comparing o values. We used a GAP o generae a -aom bulk a-c srucure and cleaved five differen surfaces normal o he [1] direcion of he simulaion cell (cf. Fig. 14a). For each surface, he unrelaxed surface energy was evaluaed using he hree mehods (Fig. 14b). The GAP model fully reproduces he sabiliy ordering; for he mos sable surface (srucure 1), GAP and resuls differ by less han.1 J m 2. For he wo leas sable candidaes, 4 and, his difference increases slighly bu remains small (below.1 J m 2, or 2%). The screened Tersoff poenial yields much lower surface energies, similar o wha has been repored for diamond. 74 We finally perform high-emperaure annealing simulaions wih our GAP, o assess srucural relaxaions and reconsrucions a a-c surfaces. These are associaed wih an increased formaion of sp 2 aoms ( graphiizaion ) ha has been observed in several ex-siu experimens 9 and also in siu during film growh. 91 A discussion of he relevan differences beween experimen and heory has been given by Marks. 89 Higher emperaures han in experimen mus be used o overcome kineic barriers during simulaion as experimens ypically involve up o one hour of annealing. 9 In ha sense, he absolue annealing emperaure used for simulaion is ficious; 89 is choice depends on he compuaional mehod, 8 and a suiable annealing emperaure mus herefore be found by rial and error. In Fig. 14c d, we explore he use of differen such emperaures, and in paricular we analyze he srucures obained before and afer each of he differen annealing runs (Fig. 1). Each slab is gradually heaed o he arge emperaure over ps, annealed for ps, and hen cooled back o 3 K over anoher ps; each srucure conains aoms, and en independen ones are sudied in parallel o improve saisics. Monioring he concenraion of sp 2 aoms during hese simulaions provides he mos direc insigh: heaing o K induces no significan changes overall bu heals he dangling bonds direcly a he surface; herefore, he K annealed srucure may be a useful represenaive of he non-graphiized surface. A he inermediae seing of K, he sp 2 concenraion in he sysem rises slighly during annealing and is hen lowered again during cooling; he inerior of he slab and is densiy remain close o ha of bulk a-c, whereas reconsrucions are observed a he surface. Finally, heaing o 3 K graphiizes he enire sysem; his is reminiscen of wha was seen earlier by Powles and co-workers using he EDIP model. 7 I also leads o a srong expansion of he slab inerior (Fig. 1d). A op view bes visualizes he aomic-scale processes a he surface (Fig. 16). The freshly cleaved, unrelaxed srucure shows a number of dangling bonds and lowcoordinaed aoms, rivially so as he erahedra in a-c have been cu apar. These defecs largely disappear during annealing a K already, bu a his empera-

14 14 a Unrelaxed surface N = 1 (defec) N = 2 ( sp ) N = 3 ( sp 2 ) N = 4 ( sp 3 ) been repored for several binary 34,37,46 and even ernary sysems, such as in a very recen sudy on mixed Cu Ce O nanoparicles using an arificial neural-nework poenial. 93 In erms of mulicomponen sysems, i would likewise be ineresing o move from amorphous carbon o he binary Si C sysem, and o compare again wih he performance of an esablished screened empirical poenial. 18 V. CONCLUSIONS b K: Removal c K: Surface d of dangling bonds reconsrucion 3 K: All-slab graphiizaion FIG. 16. Top views of he same surface srucure before (a) and afer (b d) differen degrees of annealing. Only aoms in he ouermos 3 Å are shown, and coloring indicaes he coordinaion number. ure he surface says srongly disordered (Fig. 16b). By conras, increasing he annealing emperaure o K leads o graphiized layers of several Å hickness a he surfaces (Fig. 1c): sixfold rings are seen, as well as pairs of five- and sevenfold ones ha are likely measable (Fig. 16c); sill, he surface aoms are conneced o lowerlying sp 3 aoms even wihin he opmos 3 Å, and he graphiizaion herefore remains a genuine surface phenomenon. By conras, during annealing a 3 K, he enire slab graphiizes as seen above, and a srongly defecive graphene shee begins o deach from he surface; no near-surface sp 3 aoms are seen any more (Fig. 16d). While he presen simulaions deal wih pure a-c, i would be an ineresing nex sep o exend he GAP model o hydrogenaed (a-c:h) surfaces, which are likewise imporan for applicaions 6,18 and have been sudied using empirical poenials (see, e.g., Ref. 92). For ML models, such mulicomponen exensions require significan effor, as he underlying quanum-mechanical reference daabases have o be exended and adaped, and he complexiy of his rises seeply wih he number of species involved. Noneheless, feasibiliy sudies have We have developed a machine-learning based GAP model for aomisic simulaions of liquid and amorphous elemenal carbon. The srucural complexiy ha he poenial has o encompass, as well as he nonlocaliy in forces are noable, and larger han in any previous ML-based ineraomic poenial model. Noneheless, our GAP predics energies ha are largely in he range of ens of mev/aom; characerisic srucural properies, such as he sp 3 coun and he medium-range order as expressed hrough ring saisics, are faihfully recovered, and surface energies and reconsrucions are well described by he GAP. The cenral issue in he developmen of aomisic maerials modeling remains in he radeoff beween accuracy and cos. The GAP model presened here is many orders of magniude faser han, bu slower han sae-ofhe-ar empirical poenials (while similarly linear scaling). Being hus inermediae beween boh realms, GAP models appear o be promising ools for accurae largescale aomisic simulaions, including amorphous maerials and heir surfaces. ACKNOWLEDGMENTS We hank L. Pasewka, A. P. Barók, J. R. Kermode, D. M. Proserpio, and S. R. Ellio for ongoing valuable discussions and for helpful remarks on his work. V.L.D. graefully acknowledges a posdocoral fellowship from he Alexander von Humbold Foundaion and suppor from he Isaac Newon Trus (Triniy College Cambridge). This work used he ARCHER UK Naional Supercompuing Service (hp:// via EPSRC Gran EP/K146/1. Daa Access Saemen: The ineraomic poenial and reference daa are freely available a hp:// addiional raw daa are deposied in an online reposiory a hps://doi.org/.17863/cam.743. vld24@cam.ac.uk 1 A. Hirsch, Na. Maer. 9, 868 (). 2 V. Georgakilas, J. A. Perman, J. Tucek, and R. Zboril, Chem. Rev. 11, 4744 (1). 3 R. Hoffmann, A. A. Kabanov, A. A. Golov, and D. M. Proserpio, Angew. Chem. In. Ed., 962 (16). 4 D. R. McKenzie, D. Muller, and B. A. Pailhorpe, Phys. Rev. Le. 67, 773 (1991). D. R. McKenzie, Rep. Prog. Phys. 9, 1611 (1999). 6 J. Roberson, Maer. Sci. Eng. R Repors 37, 129 (2).