<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &amp; Outside-In View</title>
	<atom:link href="http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/feed/" rel="self" type="application/rss+xml" />
	<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/</link>
	<description>Reflections on the Web and Web 2.0</description>
	<lastBuildDate>Wed, 22 May 2013 18:57:51 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Why I&#8217;m Evaluating ResearchGate &#171; UK Web Focus</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-131374</link>
		<dc:creator><![CDATA[Why I&#8217;m Evaluating ResearchGate &#171; UK Web Focus]]></dc:creator>
		<pubDate>Wed, 06 Feb 2013 10:00:36 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-131374</guid>
		<description><![CDATA[[...] As described previously workflow processes used in the creation of cover sheets for items hosted in our repository means that metadata embedded in PDFs is lost. Although we&#8217;re having discussions with repository staff about this, it occurred to me that I now have an ideal opportunity to make use of a third-party repository service. [...]]]></description>
		<content:encoded><![CDATA[<p>[...] As described previously workflow processes used in the creation of cover sheets for items hosted in our repository means that metadata embedded in PDFs is lost. Although we&#8217;re having discussions with repository staff about this, it occurred to me that I now have an ideal opportunity to make use of a third-party repository service. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Reflections on the Discussion on the Quality of Embedded Metadata in PDFs &#171; UK Web Focus</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130353</link>
		<dc:creator><![CDATA[Reflections on the Discussion on the Quality of Embedded Metadata in PDFs &#171; UK Web Focus]]></dc:creator>
		<pubDate>Fri, 11 Jan 2013 12:28:17 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130353</guid>
		<description><![CDATA[[...] Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &amp; Outside-In&#160;... [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &amp; Outside-In&nbsp;&#8230; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Embedded Metadata in PDFs Hosted in Institutional Repositories ... &#124; WindGatherer - weathering the data deluge &#124; Scoop.it</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130289</link>
		<dc:creator><![CDATA[Embedded Metadata in PDFs Hosted in Institutional Repositories ... &#124; WindGatherer - weathering the data deluge &#124; Scoop.it]]></dc:creator>
		<pubDate>Wed, 09 Jan 2013 16:46:23 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130289</guid>
		<description><![CDATA[[...] Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers&#039; work in ...&#160; [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers&#039; work in &#8230;&nbsp; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Embedded Metadata in PDFs Hosted in Institutional Repositories ... &#124; Open is mightier &#124; Scoop.it</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130280</link>
		<dc:creator><![CDATA[Embedded Metadata in PDFs Hosted in Institutional Repositories ... &#124; Open is mightier &#124; Scoop.it]]></dc:creator>
		<pubDate>Wed, 09 Jan 2013 09:51:49 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130280</guid>
		<description><![CDATA[[...] Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers&#039; work in ...&#160; [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Institutional repository managers, research support staff and librarians could be prompting their institutions to make the most of these externally provided services, to enhance the visibility of their researchers&#039; work in &#8230;&nbsp; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: PDF metadata: different tool, same story - Ross Mounce</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130208</link>
		<dc:creator><![CDATA[PDF metadata: different tool, same story - Ross Mounce]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 11:31:55 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130208</guid>
		<description><![CDATA[[...] Brian Kelly, has taken this a slightly different direction and looked at the metadata of PDFs in institutional repositories. I hadn&#8217;t realise this but apparently some institutional repositories (IRs) universally add [...]]]></description>
		<content:encoded><![CDATA[<p>[...] Brian Kelly, has taken this a slightly different direction and looked at the metadata of PDFs in institutional repositories. I hadn&#8217;t realise this but apparently some institutional repositories (IRs) universally add [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130206</link>
		<dc:creator><![CDATA[Nick]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:57:38 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130206</guid>
		<description><![CDATA[Yes I added it myself and no it clearly isn&#039;t scalable! As I said in my original comment I&#039;m not sure how our workflow will evolve as we implement Symplectic and historically my workflows have tended to be somewhat labour intensive which is a combined result of my less than optimal (research repository) software and my somewhat pedantic nature! 

I wonder if some of these issues might be relevant within the context of the UK RepNet project which is holding a meeting in London on 21st Jan - http://www.rsp.ac.uk/events/supporting-and-enhancing-your-repository/]]></description>
		<content:encoded><![CDATA[<p>Yes I added it myself and no it clearly isn&#8217;t scalable! As I said in my original comment I&#8217;m not sure how our workflow will evolve as we implement Symplectic and historically my workflows have tended to be somewhat labour intensive which is a combined result of my less than optimal (research repository) software and my somewhat pedantic nature! </p>
<p>I wonder if some of these issues might be relevant within the context of the UK RepNet project which is holding a meeting in London on 21st Jan &#8211; <a href="http://www.rsp.ac.uk/events/supporting-and-enhancing-your-repository/" rel="nofollow">http://www.rsp.ac.uk/events/supporting-and-enhancing-your-repository/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kelly (UK Web Focus)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130205</link>
		<dc:creator><![CDATA[Brian Kelly (UK Web Focus)]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:55:44 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130205</guid>
		<description><![CDATA[@Kara  Yes, when I logged out of ResearchGate I initially thought I had to login to access the full text. I subsequently realised realised that this isn&#039;t mandatory, but they imply it is in order to maximise subscriptions.  Slightly spammy but, as you pointed out, the post wasn&#039;t about ResearchGate &lt;em&gt;per se&lt;/em&gt;.

PS  I&#039;ve realised that nesting comments doesn&#039;t really work, which is why this comment is out of sequence.]]></description>
		<content:encoded><![CDATA[<p>@Kara  Yes, when I logged out of ResearchGate I initially thought I had to login to access the full text. I subsequently realised realised that this isn&#8217;t mandatory, but they imply it is in order to maximise subscriptions.  Slightly spammy but, as you pointed out, the post wasn&#8217;t about ResearchGate <em>per se</em>.</p>
<p>PS  I&#8217;ve realised that nesting comments doesn&#8217;t really work, which is why this comment is out of sequence.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kelly (UK Web Focus)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130204</link>
		<dc:creator><![CDATA[Brian Kelly (UK Web Focus)]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:51:09 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130204</guid>
		<description><![CDATA[@Nick:  The copy of the PDF you have deposited in your repository contains value embedded metadata. Did you create that yourself?  If so, then this does not seem to be a scalable solution.  As Pete Cliff has pointed out, we are talking about addressing workflow issues. This post was initially about whether adding,coversheets would lose embedded metadata and how significant a problem this might be across the sector. However the subsequent discussions (here and on Twitter)  have broadened the discussion to include considerations of PDFs which may be taken from one repository and added to another: will this (should this) result in cascading coversheets? Should embedded metadata be preserved during the process?]]></description>
		<content:encoded><![CDATA[<p>@Nick:  The copy of the PDF you have deposited in your repository contains value embedded metadata. Did you create that yourself?  If so, then this does not seem to be a scalable solution.  As Pete Cliff has pointed out, we are talking about addressing workflow issues. This post was initially about whether adding,coversheets would lose embedded metadata and how significant a problem this might be across the sector. However the subsequent discussions (here and on Twitter)  have broadened the discussion to include considerations of PDFs which may be taken from one repository and added to another: will this (should this) result in cascading coversheets? Should embedded metadata be preserved during the process?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Cliff</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130203</link>
		<dc:creator><![CDATA[Peter Cliff]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:35:52 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130203</guid>
		<description><![CDATA[This just sounds like a workflow problem - if adding the coversheet removes embedded metadata then, assuming you want to keep the embedded metadata, fix the thing that adds the coversheet.

On the wider issue - resources that lose their way home when out in the wild - I think it makes sense that on ingest any content gets the URL to its catalogue record (the IR page in this instance) embedded. While a coverpage could contain that information, it&#039;d be nicer if it were machine readable.

I&#039;m aware of two schools of thought in the preservation community about this - one says embed as much metadata as possible in the file such that it is self-describing. The other says a link to a catalogue record is enough. The latter relies on the persistence of the catalogue - which not everyone can guarantee and as such is perhaps more of a risk.]]></description>
		<content:encoded><![CDATA[<p>This just sounds like a workflow problem &#8211; if adding the coversheet removes embedded metadata then, assuming you want to keep the embedded metadata, fix the thing that adds the coversheet.</p>
<p>On the wider issue &#8211; resources that lose their way home when out in the wild &#8211; I think it makes sense that on ingest any content gets the URL to its catalogue record (the IR page in this instance) embedded. While a coverpage could contain that information, it&#8217;d be nicer if it were machine readable.</p>
<p>I&#8217;m aware of two schools of thought in the preservation community about this &#8211; one says embed as much metadata as possible in the file such that it is self-describing. The other says a link to a catalogue record is enough. The latter relies on the persistence of the catalogue &#8211; which not everyone can guarantee and as such is perhaps more of a risk.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130202</link>
		<dc:creator><![CDATA[Nick]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:33:17 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130202</guid>
		<description><![CDATA[Whether a cover sheet is less than ideal from a user POV was an issue that arose as part of a subsequent discussion on Twitter - https://twitter.com/mrnick/status/287191698319241217 

Not sure I necessarily see that argument myself and is fairly standard practice from several major publishers as well as repositories I think and strikes me that it&#039;s rather like arguing a book-cover is inconvenient as it&#039;s such a faff to open it and get past that page with the isbn and copyright on it! The effect on pdf metadata and how that is indexed by search engines is a separate issue. 

The example linked above which is originally from our repository at Leeds Met and now has multiple cover sheets is more of a practical issue. Nothing wrong with sourcing papers from other repositories to increase exposure etc; I&#039;ve just uploaded our paper from or2012 in Opus - http://opus.bath.ac.uk/30226/ - to our repository at Leeds Met and replaced the cover sheet with our own - http://repository.leedsmet.ac.uk/main/view_record.php?identifier=7827&amp;SearchGroup=Research 

I&#039;m not sure whether the Portsmouth repo manager felt they should retain the Leeds Met cover sheet to preserve provenence or if it was merely an oversight (I should probably have also removed the existing cover sheet(s) when I originally uploaded!) in any case I agree the final result is somewhat unfortunate with 6 pages before the actual content of the research itself :-/]]></description>
		<content:encoded><![CDATA[<p>Whether a cover sheet is less than ideal from a user POV was an issue that arose as part of a subsequent discussion on Twitter &#8211; <a href="https://twitter.com/mrnick/status/287191698319241217" rel="nofollow">https://twitter.com/mrnick/status/287191698319241217</a> </p>
<p>Not sure I necessarily see that argument myself and is fairly standard practice from several major publishers as well as repositories I think and strikes me that it&#8217;s rather like arguing a book-cover is inconvenient as it&#8217;s such a faff to open it and get past that page with the isbn and copyright on it! The effect on pdf metadata and how that is indexed by search engines is a separate issue. </p>
<p>The example linked above which is originally from our repository at Leeds Met and now has multiple cover sheets is more of a practical issue. Nothing wrong with sourcing papers from other repositories to increase exposure etc; I&#8217;ve just uploaded our paper from or2012 in Opus &#8211; <a href="http://opus.bath.ac.uk/30226/" rel="nofollow">http://opus.bath.ac.uk/30226/</a> &#8211; to our repository at Leeds Met and replaced the cover sheet with our own &#8211; <a href="http://repository.leedsmet.ac.uk/main/view_record.php?identifier=7827&#038;SearchGroup=Research" rel="nofollow">http://repository.leedsmet.ac.uk/main/view_record.php?identifier=7827&#038;SearchGroup=Research</a> </p>
<p>I&#8217;m not sure whether the Portsmouth repo manager felt they should retain the Leeds Met cover sheet to preserve provenence or if it was merely an oversight (I should probably have also removed the existing cover sheet(s) when I originally uploaded!) in any case I agree the final result is somewhat unfortunate with 6 pages before the actual content of the research itself :-/</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: karajones (@karajones)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130201</link>
		<dc:creator><![CDATA[karajones (@karajones)]]></dc:creator>
		<pubDate>Mon, 07 Jan 2013 09:17:56 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130201</guid>
		<description><![CDATA[Hi Brian,
This was my experience with ResearchGate - not to question the service or the point you are making - maximising visibility is great - but for information.  The first time you click on the link, as a non-registered user this message appears as a pop-up:  &quot;You are trying to access the full-text version of A challenge to web accessibility metrics and guidelines: putting people and processes first. Sign up to ResearchGate and request a full-text of this article! &quot;
Attempting a second time gave me the full text.  So whilst registering is not apparently compulsory, this is not the impression given to first time users.
BW,
Kara]]></description>
		<content:encoded><![CDATA[<p>Hi Brian,<br />
This was my experience with ResearchGate &#8211; not to question the service or the point you are making &#8211; maximising visibility is great &#8211; but for information.  The first time you click on the link, as a non-registered user this message appears as a pop-up:  &#8220;You are trying to access the full-text version of A challenge to web accessibility metrics and guidelines: putting people and processes first. Sign up to ResearchGate and request a full-text of this article! &#8221;<br />
Attempting a second time gave me the full text.  So whilst registering is not apparently compulsory, this is not the impression given to first time users.<br />
BW,<br />
Kara</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kelly (UK Web Focus)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130184</link>
		<dc:creator><![CDATA[Brian Kelly (UK Web Focus)]]></dc:creator>
		<pubDate>Sun, 06 Jan 2013 21:36:20 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130184</guid>
		<description><![CDATA[Also note that if you go to the information on ResearchGate on the paper on &lt;a href=&quot;http://www.researchgate.net/publication/216566723_Approaches_To_Archiving_Professional_Blogs_Hosted_In_The_Cloud&quot; rel=&quot;nofollow&quot;&gt;Approaches To Archiving Professional Blogs Hosted In The Cloud&lt;/a&gt; you will see a thumbnail (including the cover page) of the copy hosted on the University of Bath repository and you can download the PDF.

If you go to the page about the paper on &lt;a href=&quot;http://www.researchgate.net/publication/223999202_A_challenge_to_web_accessibility_metrics_and_guidelines_putting_people_and_processes_first&quot; rel=&quot;nofollow&quot;&gt;A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First&lt;/a&gt; you will be able to view a thumbnail of the PDF (with no cover page) which was uploaded by one of my co-authors.  You can also download the PDF.

In both cases, you do not need to sign in to the service to access the PDFs,]]></description>
		<content:encoded><![CDATA[<p>Also note that if you go to the information on ResearchGate on the paper on <a href="http://www.researchgate.net/publication/216566723_Approaches_To_Archiving_Professional_Blogs_Hosted_In_The_Cloud" rel="nofollow">Approaches To Archiving Professional Blogs Hosted In The Cloud</a> you will see a thumbnail (including the cover page) of the copy hosted on the University of Bath repository and you can download the PDF.</p>
<p>If you go to the page about the paper on <a href="http://www.researchgate.net/publication/223999202_A_challenge_to_web_accessibility_metrics_and_guidelines_putting_people_and_processes_first" rel="nofollow">A Challenge to Web Accessibility Metrics and Guidelines: Putting People and Processes First</a> you will be able to view a thumbnail of the PDF (with no cover page) which was uploaded by one of my co-authors.  You can also download the PDF.</p>
<p>In both cases, you do not need to sign in to the service to access the PDFs,</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &#38; Outside-In View &#124; Information Science &#124; Scoop.it</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130181</link>
		<dc:creator><![CDATA[Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &#38; Outside-In View &#124; Information Science &#124; Scoop.it]]></dc:creator>
		<pubDate>Sun, 06 Jan 2013 17:24:54 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130181</guid>
		<description><![CDATA[[...] PDF Metadata &#8211; Why Is it So Poor? PDF metadata &#8211; why so poor? asked Ross Mounce in a blog post published on New Year&#8217;s eve. In the post Ross expressed surprise that although &#8221;with publi...&#160; [...]]]></description>
		<content:encoded><![CDATA[<p>[...] PDF Metadata &ndash; Why Is it So Poor? PDF metadata &ndash; why so poor? asked Ross Mounce in a blog post published on New Year&rsquo;s eve. In the post Ross expressed surprise that although &rdquo;with publi&#8230;&nbsp; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kelly (UK Web Focus)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130132</link>
		<dc:creator><![CDATA[Brian Kelly (UK Web Focus)]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 17:09:53 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130132</guid>
		<description><![CDATA[Hi Kara
  As an example of the problems which can be caused by cover sheets see &lt;a href=&quot;http://eprints.port.ac.uk/2278/1/Athletes_Use_of_Reputation_and_Gender_Information_When_Forming_Initial_Expectancies_of_Coaches.pdf&quot; rel=&quot;nofollow&quot;&gt;this example&lt;/a&gt;!
   Note I didn&#039;t say that coversheets were bad &lt;em&gt;per se&lt;/em&gt;. My post was about the need to keep the metadata, as this may be processed by other tools.   
    I would agree with you, however, that it would be useful to investigate how the Eprints plugin could be used to enhance the disoverability of repository items.

Brian]]></description>
		<content:encoded><![CDATA[<p>Hi Kara<br />
  As an example of the problems which can be caused by cover sheets see <a href="http://eprints.port.ac.uk/2278/1/Athletes_Use_of_Reputation_and_Gender_Information_When_Forming_Initial_Expectancies_of_Coaches.pdf" rel="nofollow">this example</a>!<br />
   Note I didn&#8217;t say that coversheets were bad <em>per se</em>. My post was about the need to keep the metadata, as this may be processed by other tools.<br />
    I would agree with you, however, that it would be useful to investigate how the Eprints plugin could be used to enhance the disoverability of repository items.</p>
<p>Brian</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: karajones (@karajones)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130131</link>
		<dc:creator><![CDATA[karajones (@karajones)]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 16:58:46 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130131</guid>
		<description><![CDATA[Hi Brian,

We have had discussions about coversheets in the past, and as you note, have worked to solve some issues.  Can I reiterate please, that the coversheet is a policy decision and that actually due the lack of identifying content on many pdfs we receive, it&#039;s not a bad decision.  In light of this, we are not going to drop the coversheet but perhaps there is potential for the Eprints plugin to draw more/better metadata onto the page to aid SEO?

Am I right in thinking you need to subscribe to ResearchGate in order to view full text?

Also, what&#039;s so bad about a coversheet from a users POV?  Granted the machine-readable issues but I often find it easier to find identifying metadata about a paper from the coversheet than from the document itself, particularly on post-prints.  Am I alone here?!

Kara]]></description>
		<content:encoded><![CDATA[<p>Hi Brian,</p>
<p>We have had discussions about coversheets in the past, and as you note, have worked to solve some issues.  Can I reiterate please, that the coversheet is a policy decision and that actually due the lack of identifying content on many pdfs we receive, it&#8217;s not a bad decision.  In light of this, we are not going to drop the coversheet but perhaps there is potential for the Eprints plugin to draw more/better metadata onto the page to aid SEO?</p>
<p>Am I right in thinking you need to subscribe to ResearchGate in order to view full text?</p>
<p>Also, what&#8217;s so bad about a coversheet from a users POV?  Granted the machine-readable issues but I often find it easier to find identifying metadata about a paper from the coversheet than from the document itself, particularly on post-prints.  Am I alone here?!</p>
<p>Kara</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Stewart (@neilstewart)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130129</link>
		<dc:creator><![CDATA[Neil Stewart (@neilstewart)]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 15:58:40 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130129</guid>
		<description><![CDATA[Thanks Brian, I see the distinction- not every service will use OAI-PMH or web crawling, some might parse the objects themselves. It looks to me like Word docs we turn into PDFs here at City have garbage contained in the original doc&#039;s metadata, we might have to look at this.]]></description>
		<content:encoded><![CDATA[<p>Thanks Brian, I see the distinction- not every service will use OAI-PMH or web crawling, some might parse the objects themselves. It looks to me like Word docs we turn into PDFs here at City have garbage contained in the original doc&#8217;s metadata, we might have to look at this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kelly (UK Web Focus)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130127</link>
		<dc:creator><![CDATA[Brian Kelly (UK Web Focus)]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 15:29:54 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130127</guid>
		<description><![CDATA[Hi Neil

Thanks for the comment.

Your question &quot;&lt;em&gt;does it matter if the rare and patchy instances of author-created metadata gets over-written or otherwise distorted?&lt;/em&gt;&quot;  is a good one.

The example provided by Ingmar Koch (of template metadata which is not updated) illustrates that such embedded metadata is used. In this case, the authors may well have preferred it if the metadata had been lost as part of the creation of a cover sheet!  However this does show how embedded metadata is being processed by Google.  In addition, as I suggested in my post, we could find that other third party services processed the metadata associated with the object, rather than metadata which is decoupled form the resource.

However if there are concerns that the metadata will be poor (as in Ingmar Koch&#039;s example) perhaps there is a need to be honest about this, and explicitly state that embedded metadata will be removed prior to the resource being deposited in the repository.]]></description>
		<content:encoded><![CDATA[<p>Hi Neil</p>
<p>Thanks for the comment.</p>
<p>Your question &#8220;<em>does it matter if the rare and patchy instances of author-created metadata gets over-written or otherwise distorted?</em>&#8221;  is a good one.</p>
<p>The example provided by Ingmar Koch (of template metadata which is not updated) illustrates that such embedded metadata is used. In this case, the authors may well have preferred it if the metadata had been lost as part of the creation of a cover sheet!  However this does show how embedded metadata is being processed by Google.  In addition, as I suggested in my post, we could find that other third party services processed the metadata associated with the object, rather than metadata which is decoupled form the resource.</p>
<p>However if there are concerns that the metadata will be poor (as in Ingmar Koch&#8217;s example) perhaps there is a need to be honest about this, and explicitly state that embedded metadata will be removed prior to the resource being deposited in the repository.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Neil Stewart (@neilstewart)</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130125</link>
		<dc:creator><![CDATA[Neil Stewart (@neilstewart)]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 15:18:45 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130125</guid>
		<description><![CDATA[Hi Brian et al.

As noted on Twitter, we often convert Word docs into PDFs on behalf of our academics here at City to put them into &lt;a href=&quot;http://openaccess.city.ac.uk/&quot; / rel=&quot;nofollow&quot;&gt;City Research Online&lt;/a&gt;- it&#039;s rare to get &quot;author final&quot; versions in PDF format. I suspect that even when we do get PDFs, it&#039;s rare to have &quot;good&quot; embedded metadata in that PDF.

On the question of discoverability, I had assumed that the structured metadata provided at Eprint/DSpace/other repository software record level did the job here (as opposed to metadata embedded within the PDF itself). Certainly records in City Research Online are highly ranked in Google and are harvested by Google Scholar, BASE, OAIster etc. If this is the case, does it matter if  the rare and patchy instances of author-created metadata gets over-written or otherwise distorted?

Neil.]]></description>
		<content:encoded><![CDATA[<p>Hi Brian et al.</p>
<p>As noted on Twitter, we often convert Word docs into PDFs on behalf of our academics here at City to put them into <a href="http://openaccess.city.ac.uk/" / rel="nofollow">City Research Online</a>- it&#8217;s rare to get &#8220;author final&#8221; versions in PDF format. I suspect that even when we do get PDFs, it&#8217;s rare to have &#8220;good&#8221; embedded metadata in that PDF.</p>
<p>On the question of discoverability, I had assumed that the structured metadata provided at Eprint/DSpace/other repository software record level did the job here (as opposed to metadata embedded within the PDF itself). Certainly records in City Research Online are highly ranked in Google and are harvested by Google Scholar, BASE, OAIster etc. If this is the case, does it matter if  the rare and patchy instances of author-created metadata gets over-written or otherwise distorted?</p>
<p>Neil.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &#38; Outside-In View &#124; Open Access Corner &#124; Scoop.it</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130121</link>
		<dc:creator><![CDATA[Embedded Metadata in PDFs Hosted in Institutional Repositories: An Inside-Out &#38; Outside-In View &#124; Open Access Corner &#124; Scoop.it]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 12:25:16 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130121</guid>
		<description><![CDATA[[...] PDF Metadata &#8211; Why Is it So Poor? PDF metadata &#8211; why so poor? asked Ross Mounce in a blog post published on New Year&#8217;s eve. In the post Ross expressed surprise that although &#8221;with publi...&#160; [...]]]></description>
		<content:encoded><![CDATA[<p>[...] PDF Metadata &ndash; Why Is it So Poor? PDF metadata &ndash; why so poor? asked Ross Mounce in a blog post published on New Year&rsquo;s eve. In the post Ross expressed surprise that although &rdquo;with publi&#8230;&nbsp; [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130118</link>
		<dc:creator><![CDATA[Nick]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 11:08:53 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130118</guid>
		<description><![CDATA[Hi Brian

Need more than 140! 

Historically yes, I have added cover pages to PDFs. I&#039;m not sure how typical my workflow is though as, mainly due to software idiosyncracies (I don&#039;t have EPrints and its nice workflow), I have always offered a fully-mediated service which typically involves me soliciting a suitable (author produced) version of a paper, usually as a word doc(x) and converting it myself with Acrobat and manually adding a cover page. I believe there is a plug in for EPrints though that automatically adds a cover page? In any case, my manual workflow will just result in a PDF with no metadata unless I add it manually. If these are subsequently picked up by Google I just get &quot;Leeds Metropolitan University Repository&quot; in the search result i.e. indexed from the text of the PDF itself.

All this has been on my mind recently as we are in the process of implementing Symplectic and integrating with the repository such that I should finally be able to implement a self-deposit workflow, but I anticipate folk are likely to upload word files rather than convert them to PDF...so not sure how my workflow will develop; I&#039;ll obviously offer guidance, but I&#039;m keen to procure content in any format, whether I intervene in the workflow, convert to PDF, add a nice coversheet and metadata depends on how onerous that becomes.

Nick]]></description>
		<content:encoded><![CDATA[<p>Hi Brian</p>
<p>Need more than 140! </p>
<p>Historically yes, I have added cover pages to PDFs. I&#8217;m not sure how typical my workflow is though as, mainly due to software idiosyncracies (I don&#8217;t have EPrints and its nice workflow), I have always offered a fully-mediated service which typically involves me soliciting a suitable (author produced) version of a paper, usually as a word doc(x) and converting it myself with Acrobat and manually adding a cover page. I believe there is a plug in for EPrints though that automatically adds a cover page? In any case, my manual workflow will just result in a PDF with no metadata unless I add it manually. If these are subsequently picked up by Google I just get &#8220;Leeds Metropolitan University Repository&#8221; in the search result i.e. indexed from the text of the PDF itself.</p>
<p>All this has been on my mind recently as we are in the process of implementing Symplectic and integrating with the repository such that I should finally be able to implement a self-deposit workflow, but I anticipate folk are likely to upload word files rather than convert them to PDF&#8230;so not sure how my workflow will develop; I&#8217;ll obviously offer guidance, but I&#8217;m keen to procure content in any format, whether I intervene in the workflow, convert to PDF, add a nice coversheet and metadata depends on how onerous that becomes.</p>
<p>Nick</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ingmar Koch</title>
		<link>http://ukwebfocus.wordpress.com/2013/01/04/embedded-metadata-in-pdfs-hosted-in-institutional-repositories-an-inside-out-outside-in-view/#comment-130114</link>
		<dc:creator><![CDATA[Ingmar Koch]]></dc:creator>
		<pubDate>Fri, 04 Jan 2013 10:22:06 +0000</pubDate>
		<guid isPermaLink="false">http://ukwebfocus.wordpress.com/?p=12930#comment-130114</guid>
		<description><![CDATA[A related problem occurred in the Netherlands a couple of months ago. Apparently the company that designed the document templates for most of de government agencies added a title and author in the template-file. The result is that thousands of online government documents (.pdf and .doc) are titled &quot;at opinio facillime sumitur&quot; and are written bij M. Hes.
I wrote a blog about in, it&#039;s in Dutch, but Google Translate does a reasonable job: http://ingmarbladertenschrijft.blogspot.nl/2012/10/dat-is-maar-een-mening-metadata-in.html]]></description>
		<content:encoded><![CDATA[<p>A related problem occurred in the Netherlands a couple of months ago. Apparently the company that designed the document templates for most of de government agencies added a title and author in the template-file. The result is that thousands of online government documents (.pdf and .doc) are titled &#8220;at opinio facillime sumitur&#8221; and are written bij M. Hes.<br />
I wrote a blog about in, it&#8217;s in Dutch, but Google Translate does a reasonable job: <a href="http://ingmarbladertenschrijft.blogspot.nl/2012/10/dat-is-maar-een-mening-metadata-in.html" rel="nofollow">http://ingmarbladertenschrijft.blogspot.nl/2012/10/dat-is-maar-een-mening-metadata-in.html</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
