User:Rlward/2006 2Q
Contents
- 1 Replies to your Questions
- 2 Moses Whitney
- 3 Renaming Pages
- 4 Real User Names
- 5 What Links here / Photos
- 6 New Family Listing
- 7 Locality Pages
- 8 Listing of Pages
- 9 Locality Subcategory
- 10 Pierce Error?
- 11 Census Indexes
- 12 WRG-Robot
- 13 Family History
- 14 Family History
- 15 New Discussion Forum
- 16 Whitneys of Whitney
- 17 Photograph
- 18 Conversion
- 19 Pierce Sample
- 20 Two Columns of Children
- 21 Regarding your comments
- 22 More Pierce
- 23 More Pierce
- 24 Ruth
- 25 Indented Children
- 26 Remaining Issues
- 27 Status
- 28 Ready?
- 29 It's Running
- 30 First Batch
- 31 Dates
- 32 Oooollllddd
- 33 Andaleen
- 34 Duplicate Page Names overwriting
- 35 Uploading
- 36 Pierce Import Issues
Replies to your Questions
Importing Status
Congratulations on the completion of the projects you have been working on - we're making great progress! Phoenix is next for me, I just need to find an afternoon or evening that I can devote to writing and testing the conversion code for it.
As far as the photos go, they all need to be imported, but I noticed that several of them were very very large and need to be resized, and some need to be trimmed. Do you know how to do that?
Unattached Families
Robert - How about adding categories with names such as "Unattached Families" or "Uncertain Identifications"? I'm not sure about the category names here, but I think categories would be a good way to go. I can then use the recentpages extension to automatically / dynamically display all pages with this category identification, so we'd have the best of both worlds.
It Finally Happened!
Robert - regarding the case where two individuals of the same name also have the same birth and death information, you're right - we need a way to identify them further. My suggestion would be to either add "(of Gorham, ME)" and "(of Westminster and Fitchburg, MA)" or perhaps their spouse names. If additional information is obtained in the future to differentiate them, we can rename the pages to a more standardized name.
Sources in Footnotes
Robert - I agree - I don't think that we should cite each and every source for an event on the family pages. Let's just cite the primary / best source(s). We should still link the other items to the family page, but we don't need to have reciprocal links for every case. We can always use the 'What links here' feature to find those other sources if needed.
- Tim Doyle | Talk to me 08:24, 11 April 2006 (CDT)
Moses Whitney
Robert:
I've noticed that sometimes you delete pages that have already been created for a given family. I didn't think much about it, but I see that you're currently updating the family of Moses Whitney, and I suspect that when you complete the links, that you will delete the page that I originally created. While this results in all of the links being correct, it completely erases the editing history of the pages before you created the page with the new name. This history could be valuable at some point in the future. I'm not sure if you are doing this because it is easier than renaming the existing page and then updating, or if you are just unaware of the 'correct' way to update the name of a page. Let me know your thoughts.
Thanks!
- Tim Doyle | Talk to me 19:51, 11 April 2006 (CDT)
Renaming Pages
Robert:
Here's the scoop...
To effectively rename a page, simply click the move tab at the top, then type in the new name that you would like it to have. This moves (renames) the existing page, and all of the editing history is saved with the page. In addition, a new page is automatically created with the old page name, and this new page contains a redirect to the new page name. This means that all existing links to the old page name will get to the new page name without any problems.
Personally, I like to 'clean up' after I move/rename a page. To do this, I navigate to the redirect page that was created and then use the what links here feature to determine which pages are linking to the old name. After I update all of them to point to the new name, I simply delete the redirect page. Note that this step is optional - it is perfectly ok to just leave the redirect page in place.
I think it would be better to have redirect pages on the site rather than losing editing history.
- Tim Doyle | Talk to me 08:32, 12 April 2006 (CDT)
Real User Names
Robert:
There is no way to obtain a list of real user names via the Wiki interface, but I can obtain that by accessing the database directly. I've just emailed you the current list.
- Tim Doyle | Talk to me 08:23, 28 April 2006 (CDT)
What Links here / Photos
Hi Robert:
Here's what's going on here....
For each image uploaded, there is
- The picture itself
- A page which both describes and includes the picture
If you click on the image, you are taken to the page which describes that picture. Look at the section at the bottom of the page. It lists the pages which link to the image (which is a valid list). When you select 'What Links Here', you're asking for the list of pages which link not to the picture, but to the page that describes the picture, and there are none currently. This may not appear to be correct at first, but once you understand that the picture and that page that describes it are not one and the same, it makes sense.
If my explanation doesn't make sense, please let me know.
Thanks,
- Tim Doyle | Talk to me 09:43, 2 May 2006 (CDT)
New Family Listing
Robert:
Take a look at Whitney Family Groups and let me know what you think of the new family listing. Also, take a look at the bottom of Archives.
- Tim Doyle | Talk to me 09:28, 3 May 2006 (CDT)
Locality Pages
Robert:
Thanks for the comments - I've been meaning to get these listings working and finally had the time to get the new extensions loaded.
Take a look at User:Tdoyle/Sandbox to see what else we can do with this functionality. I have basically called the 'RecentPages' extension and asked it to display all pages in the Family namespace with a category of Vermont, and then did the same thing for the archives. Imagine the possibilities! We could have a page for each State and perhaps even each county or location, though I wouldn't recommend setting all of those up yet. There may be a way that we can have just one page and allow the user to select which state they'd like to view.
- Tim Doyle | Talk to me 12:37, 3 May 2006 (CDT)
Listing of Pages
Robert:
Yes, it's basically the same functionality, but presented in a much better and more customizable fashion. Take a look at the Category:Vermont page now. Note that you can link to a category page without actually adding the page you're placing the link on to that category by this syntax: [[:Category: Vermont]].
- Tim Doyle | Talk to me 13:35, 3 May 2006 (CDT)
Locality Subcategory
Actually, the subcategory is something that I added just today (at least to the Vermont category page). I still have plans to add that to the new page, but I haven't been able to get it to work just yet. Have you played with subcategories before, or did you just see it there after I added it today?
- Tim Doyle | Talk to me 13:44, 3 May 2006 (CDT)
Pierce Error?
Robert:
While importing the page Archive:Who were Calvin and Haynes Whitney, I noticed that the Pierce page does not mention the potential problem. Should a note be added in Pierce?
- Tim Doyle | Talk to me 17:56, 6 May 2006 (CDT)
Census Indexes
I am currently just bringing over the old pages, but my plans include going back and doing additional searches to see if we have missed any. I've also noticed that we have duplicates in our indices, as well as other things such as tax records, etc. These can be great alternatives to missing tax records, but I think they need to be called out more prominently than they are currently. I think there's a whole project around updating the census pages, but right now I'm just working on getting what we have migrated over.
- Tim Doyle | Talk to me 09:03, 8 May 2006 (CDT)
WRG-Robot
Robert:
You may have seen in the Recent Changes logs that I have been experimenting with a 'bot' under the account WRG-Robot. This is simply a set of programs which can be run to do automated tasks on MediaWiki-based websites. For example, data placed into a text file with certain keywords to designate where each page starts, ends, and what the title should be for each can be quickly imported into multiple pages on the site. It can also add, change, or remove categories from pages, make edits to all or a set of pages, etc. In order to do these edits, we must identify a unique pattern for it to follow.
There's nothing that you have to do with this information, but if you find yourself doing repetitive work of any type that you think might be able to be automated, please let me know and we can discuss how and if we can accomplish it with much less effort.
This is half of the effort to get Phoenix imported. The next step will be to analyze the existing pages and convert them to the new format, and add the necessary start/stop/pagename tags.
- Tim Doyle | Talk to me 10:04, 16 May 2006 (CDT)
Family History
I'm not sure where this should be added to the family history page. Can you add for me?
John-1 WHITNEY, m. (1) Elinor ----- ..Richard-2 WHITNEY, m. Martha COLDHAM ....Ebenezer-3 WHITNEY, m. Anna ----- ......Enoch-4 WHITNEY, m. Thankful PARKE ........Hezekiah-5 WHITNEY, m. Olive KNIGHT ..........Elisha-6 WHITNEY, m. Rachel FROST ............Marshall Frost-7 WHITNEY, m. Huldah Roxanna WARNER ..............Jason Frost-8 WHITNEY, m. Ellen Lucena TYLER ................Andrew Clarence-9 WHITNEY, m. Amelia Newman PARRISH ..................Ernest Frost-10 WHITNEY, m. Elizabeth Cargill BRUCE ....................Marian Elizabeth-11 WHITNEY, m. Albert Edward DURRELL ......................Albert Ernest DURRELL, m. Martha Helen BREIT
Thanks, Al Durrell
Family History
Thanks! Al Durrell
New Discussion Forum
Robert:
I've started a new discussion forum at Discussion Forum - Website Discussion so we can try our back and forth communications there. I have just started a discussion thread for the medieval family group pages which we have just started on today.
Thanks!
- Tim Doyle | Talk to me 21:38, 18 June 2006 (CDT)
Whitneys of Whitney
Robert:
Did you see: Whitneys of Whitney?
- Tim Doyle - Talk to me 10:24, 23 June 2006 (CDT)
Photograph
Robert:
What would you think about uploading your personal photograph from your professional site and including it on your user page?
- Tim Doyle - Talk to me 11:34, 25 June 2006 (CDT)
Conversion
Getting closer! Take a look at this (long) page: User:Tdoyle/Sandbox2.
- Tim Doyle - Talk to me 19:53, 26 June 2006 (CDT)
Pierce Sample
Thank you for the comments.
The pagenames and title have been corrected to the correct format (Family:Last, First (dates)).
Titles will be a problem as that's how Pierce has them. I'll have to see if I can pick them out one by one.
The huge narrative is one long paragraph, surprisingly!
Regarding the counties: Yes, it'll be much harder to add in the counties. Also, what county shoudl I add - the county att he time of the event, or the current-day county?
Some of the locations have issues, including the "St." instance that you cited. More work is needed here, but there will be undoubtedly some manual cleanup needed.
Here is my current list of issues to be dealt with:
- Place categories don't always work
- Add generation superscripts where needed
- Indent children of female children
- Hyphenation not always working
- SubjectName can sometimes be a link to an error page
- Sometimes periods after abbreviations can indicate the end of sentence too
- Wives name(s) need bolding
- Add Childrens' surnames
- Done, but some errors - Pagename needs to be Family:Last, First (dates)
- Weird Family at the top of the output file
Two Columns of Children
Yes, I remember that from the original transcription project, and saw it on one of my passes through the current results. It'll have to be a hand-edit situation.
I have now perfected the pagenames - designations are gone as are several other defects in my original routines.
It's getting closer!
- Tim Doyle - Talk to me 10:11, 27 June 2006 (CDT)
Regarding your comments
Titles are done, but they weren't easy - took me several hours this am to get them right. I dealt with titles with periods and names with periods (Wm., middle initials, etc.); titles without periods; titles such as "Ph. D."; improper formatting; improper spelling (Witney, Whtney); and the links to the error page.
You said "I have been changing these links, adding the word "[NOTE]" ... I'll get started right away on finishing that job." I guess I should mention that I am using the files originally imported to the wiki, not the current versions. Obtaining the current versions is possible, but would add another potentially time-consuming project to my list. I had envisioned needing to manually go through the Error page so that adjustments could be made to these family pages (rather than just linking to an explanation). Thoughts?
You said "The surname is ALWAYS Whitney." - not true, see ALBERT APPLETON (WHITNEY) FAY as one of a couple of examples.
Regarding the email you just received - there have been many, many changes since then. Would you like a fresh one so you're not reporting things that have already been fixed?
Regarding the counties: I'll have to think on this one a bit.
Thanks!
- Tim Doyle - Talk to me 10:27, 27 June 2006 (CDT)
More Pierce
Robert:
I've just emailed a new names file with corrected date ranges. I found several defects int he way I was parsing them before.
Footnotes in the original - we'll probably have to correct these manually. It is likely that by the time I reach a footnote that I have already completed and saved the family they are referring to. There aren't too many of these, are there?
Child's name is a link - I'll look into this. Do you have an example?
De-hyphenation not working well - watch to see if you are only seeing this problem in the parent text, only in the children text, or both. Please send examples of problem areas.
Thanks!
- Tim Doyle - Talk to me 18:43, 27 June 2006 (CDT)
More Pierce
How about just deleting footnotes entirely? - Done.
Child's name is a link:
- The two examples you cited are having problems due to the link to the error page. If I remove the link, it should work fine, but the link (and all other links to errors in the children section) will be gone. If I let it go as-is, we have an obvious issue which we would need to deal with manually.
De-hyphenation:
- These problems are also caused by the links to the error page. In a 'normal' non-link situation, I am looking for a dash at the end of a line, and then I merge it with the following line. In cases such as these, I would need to recognize the links, find the dash inside the text of the link, try to merge it with the next line, but that has a link too, so I would then need to compare the two links to see if they're going to the same place and if so work to combine the text of the two links into one large link. As you can see, this would be difficult, subject to bugs, and probably more trouble than just manually fixing the issues post-import.
See Joseph3 Whitney (1651-1702) [John2, John1], end of children. There two children are somehow conflated.
- This error is caused by the fact that a link to the errors page has been added to the roman numeral, causing my pattern matching to fail. Do you remember if you have added many such links on the roman numerals such as this, or can we safely fix this after the import?
Same Ruth4 Whitney as above, her narrative seems to be messed up somehow.
- I will look into this one.
Thank you for looking at these! Anythign we find now will save us time later, though we will need to balance the effort now (in the code) vs. the effort later (manual fixes) as I have done here. Let me know if you feel that any of the above shuld be fixed pre-import.
- Tim Doyle - Talk to me 08:27, 28 June 2006 (CDT)
Ruth
Robert:
I've found the defect that caused Ruth's section to be broken into two. I thought I'd show you the power and complexity of one small line in the program I am writing.
Here is the offending line that caused the problem:
if ( /\s{2,5}\d{1,4}\.\s+(\w+).\s+(.*)/ ) {
It basically says, if we find two to five spaces, followed by one to four numbers, followed by a period, followed by one or more spaces, followed by one or more alphanumeric characters (such as roman numerals), followed by a period, followed by one or more spaces, followed by anything else, then do something (the commands for which are not listed here).
This is basically looking for the children start lines in the form:
2371. x. PAUL, b. Mar. 28,...
There are two defects in this command, one which caused Ruth's problem, and another which I have just noticed while typing this explanation.
The line that caused the problem did so becaused it matched when it shouldn't have.
1690. He was dismissed from Watertown in the church at Wor-
Now there are certainly more than two to five spaces before we reach the one to four numbers here, but that's the defect - I didn't tell it that the match had to start at the beginning of the line! Once I did that, all of the children stopped working. I then realized that 2-5 spaces was a guess, and now I was expecting accuracy, so I changed the command to:
if ( /^\s{4,7}\d{1,4}\.\s+(\w+).\s+(.*)/ ) {
What's the last defect? The period after the roman numerals isn't 'escaped', so instead of matching a period, it is matching any character at all. This defect also played into Ruth's problem. The final pattern matching scheme is therefore:
if ( /^\s{4,7}\d{1,4}\.\s+(\w+)\.\s+(.*)/ ) {
Powerful, but complex, isn't it!
- Tim Doyle - Talk to me 08:46, 28 June 2006 (CDT)
Indented Children
Robert:
Take a look at User:Tdoyle/Sandbox3. Was this what you were looking for for indenting the children of the daughters?
- Tim Doyle - Talk to me 09:38, 28 June 2006 (CDT)
Remaining Issues
Good morning Robert!
Thank you for the additional abbreviations - I'll add them right away.
Yes, Sandbox4 is discouraging. The number in front C1, etc. is the number in Pierce and is what my program is using as an index. If you or Andaleen could update the links with good, valid links to the correct person (while leaving the index), that would really really help. I started to try to do this in Excel by exporting both that list as well as all of the current family pages, but I'm getting confused as to who is who. I think the bext way would just be a walk through the pages, grabbing each link. This list could be pulled off the site into a text file for easier changes perhaps and then put back when complete. Is this something that you or Andaleen could help with? If not, I'll get to it.
I was pulling my hair out with the wives last night. In order to include the wives on the 'Children of' line and in order to bold them, I have to identify them in the text. In order to do that, I have to narrow down my range using a starting point (after 'married') and an ending point. I'm having difficulties with the ending points, and all sorts of men that the Whitneys did business with, etc. are currently being seen as 'wives'. I then thought I could use a period as my end marker, which worked in 75% of the cases, but failed in others. When I investigated, I found that some dates had not been properly converted and still contained periods. This was because the date spanned from one line to the next. When I tried to move the date conversion from a point in the program when lines had not yet been joined together to a point after they had, the program failed as a certain combination of text in the combined string was breaking the parser. I'm going to give that issue another shot this morning, but if I can't fix it, I may just revert and give up on grabbing and bolding the wives.
Other remaining issue is the counties, which I have not yet looked into.
- Tim Doyle - Talk to me 07:23, 29 June 2006 (CDT)
Status
Robert:
Here's what I am currently thinking... We have a holiday weekend approaching. My birthday is on the 4th, and I have about 50 guests coming this Saturday as well as a few out-of-town guests. I would really like to get these pages imported today, or I may not get back to them for a while. With that in mind, I am seriously considering putting on hold the date issue, the wife bolding issue, and the locality (county) issues. I can deal with these later, using the robot approach, and these can be done without being intertwined with the current effort.
That means that the only thing left to do is to update the 5 generation links on sandbox4. I have started to organize these to make it easier to update the links and have started to go through each section. If you edit just a section instead of the whole page, we're less likely to step on each other.
- Tim Doyle - Talk to me 08:17, 29 June 2006 (CDT)
Ready?
Robert:
I finished going through sandbox4 and all items are blue or have been marked as having not yet been created. I've also accounted for a number of individuals who I've adjusted parentage for, though I did not do an exhaustive search. As far as I know, I am ready to go when you say you're ready.
I can send another sample output file if you'd like, or I can just go.
- Tim Doyle - Talk to me 20:05, 29 June 2006 (CDT)
It's Running
I've just started the program. As soon as it finishes, I'll trim out the first 5 generations (saving them in case we want to add in those few that are missing). Once that is done, I'll start up the TA Robot!
Oh - as far as the conglomeration goes, I think I'll have to just leave that for manual/robot cleaning I think.
- Tim Doyle - Talk to me 21:10, 29 June 2006 (CDT)
First Batch
Robert:
I've uploaded the first 36 families. If you look in the Recent Changes list, some of them won't show until you hit the 'show bots' link. I set the WRG-Robot account to 'bot' status halfway through. Start reviewing and if you see a major issue, contact me.
Thanks!
Tim
Dates
That happens because the dates are split between lines in the original. I attempted to fix, but that caused more problems than it was worth. I can always have the bot clean that up afterwards.
It's going fast, isn't it!
- Tim Doyle - Talk to me 21:37, 29 June 2006 (CDT)
Oooollllddd
Take a look at the end of the last paragraph - it says he died in that year, but apparently is referring to someone else.
The program stops when it hits a page which already exists. When that happens, I am making note of the page on Sandbox4 so we can manually merge later.
- Tim Doyle - Talk to me 21:50, 29 June 2006 (CDT)
Andaleen
Does Andaleen know about this project? We should probably let her know so she doesn't have to import any more of these, or at least knows to merge and not create.
- Tim Doyle - Talk to me 21:57, 29 June 2006 (CDT)
Duplicate Page Names overwriting
Good catch!
I'm going to keep going, but I can later generate a list of these problems and go back and correct them.
- Tim Doyle - Talk to me 22:16, 29 June 2006 (CDT)
Uploading
Well, the upload has now completed! I'm heading off to bed, and I asume you're no longer around either. I need to review where we're at and what I need to do to fix the page overwrite issue, so please do not move any pages yet (that will complicate any fix I might come up with). If you want to jump in to something, all of the 5 generation pages need to be linked to the newly created 6th generation pages.
I'll also figure out which 5 gen pages I should upload and do that in the morning as well.
Thanks Robert! Great teamwork!
- Tim Doyle - Talk to me 23:33, 29 June 2006 (CDT)
Pierce Import Issues
Robert:
Please see Pierce Conversion Project. I have programatically scanned the file created by my program and have identified all of the duplicate page names. I've also created sections for the issues which you have identified. Let's add the issues that we find here. Once we identify issues, I will investigate to find a cause, which will hopefully lead to a solution. Then I can determine if a robot or manual fix is appropriate.
Thanks!
- Tim Doyle - Talk to me 08:34, 30 June 2006 (CDT)