<div dir="ltr"><div dir="ltr">On Wed, 6 May 2020 at 01:53, Wookey <<a href="mailto:wookey@wookware.org">wookey@wookware.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 2020-05-03 17:28 +0100, Mark Shinwell wrote:<br>
<br>
> In fact if you want to look at something else, there are some problems later<br>
> on, where some ARGE data was brought in containing multiple years' surveys in<br>
> the same hg commits. I fear this may be quite extensive. <br>
<br>
OK. so I looked through all the tags to see how may 'wrong year' files<br>
were added in each year. Details are below. Then I started to compare<br>
with the hg datasets to see how different it was from before.<br></blockquote><div><br></div><div>Good work, thanks.</div><div><br></div><div><div>I didn't specify any particular conversion options to get a linear history, but I think it's probably the right thing to do, at least for the historical data. I'm less sure about going forward, but that needs separate thinking about anyway.</div><div></div></div><div><br></div><div>I'm unsure we can reasonably do the yearly checkpoint thing unless the history is linear across those points.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Marking the post-<year> tags in hg to match the git ones reveals one<br>
very odd thing: post-2012 is the commit _after_ post-2013 in hg. (on<br>
different branches). So it looks like both years were sorted at the<br>
same time on parallel branches then merged.<br></blockquote><div><br></div><div>Yeah, I spotted this too, which seemed confusing. I deliberately re-ordered some of the 2012/2013 changesets, where the long-standing separate branch existed, so the 2012 changesets ended up earlier.</div><div><br></div><div>I think the current situation is overall an improvement. There is the potential problem with regards to synchronisation with the web site, but I'm not really sure that's worth worrying about -- it seems like a rather infrequent event for someone to try to check out an old version. It seems much more likely to happen with the dataset.</div><div><br></div><div>I'll start looking at your list when I've finished with the "gap", which shouldn't be too long. I think it is worth spending the time now to get matters sorted.</div><div><br></div><div>Mark</div></div></div>