Category: Administration

SharePoint Administration related posts

Finding Duplicate Items and The Duplicates Keyword

I have had a few questions around de-duping files within a SharePoint environment recently so I set off to do some research to identify a good solution.  Based on past experiences I knew that SharePoint identifies duplicates while performing an index of the content so I expected this would be part of the solution.

Upon starting my journey, I found a couple of threads on various forums where the question has been asked in the past.  The first one was “Good De-Dup tools for SharePoint” which had a link to a blog post by Gary Lapointe that offered a PowerShell script that can list every library item in a farm.  At first glance this seemed to be neat, but not helpful here.

Next I found a blog post with another handy PowerShell script.  This blog post was title Finding Duplicate Documents in SharePoint using PowerShell.  I found this script interesting, albeit dangerous.  It will iterate through all of your site collections, sites, and libraries, hash each document and compare for duplicates.  It however only identifies duplicate documents within the same location.  The overhead of running this script is going to be pretty high, and it gets a little risky when you have larger content stores.  I would be worried about running this against an environment that has 100s of sites, or large numbers of documents.

Next I found an old MSDN thread named Find duplicate files which had two interesting answers.  The first was to query the database (very bad idea) and the second was a response by Paul Galvin that pointed to the duplicates keyword property, and a suggestion to execute a series of alpha wildcard searches with the duplicates keyword.  While I have used the duplicates keyword before I had never thought to use it in this context so I set out to give it a try.

As I mentioned at the beginning SharePoint Search does identify duplicates documents.   It does this by generating a hash of the document.  Unlike the option above where the PowerShell generates a hash, the search hash seems to separate out the meta-data so even items with unique locations, meta-data, and document names can still be identified as identical documents. 

When doing some tests though I quickly discovered that the duplicates property requires the full document URL.  This means that you would have to execute a recursive search.  First you would have to get a list of items to work with, and then you would then need to iterate through each of those items and execute the duplicates search with a query such as duplicates:”[full document url]”.

Conceptually there are two paths forward at this point.  The first is to try and obtain a list of all items from SharePoint Search.  Unfortunately you cannot get a full list of everything.  The best you can do is the lose title search that Paul had suggested.  Something like title:”a*” which would return all items with an a in the title.  You would then have to go through and do that for all letters and numbers.  One extra challenge is that you will be repeatedly processing the same items unless you are using FAST Query Language and have access to the starts-with operator and can do something like title:starts-with(“a”).  In addition, since we are only looking for documents, its an extremely good idea to also add in the isdocument:true to your query to ensure that only documents are returned.  Overall this is a very inefficient process.

An alternative would be to revisit and extend Gary’s original script to execute the duplicates search for each item.  The advantage here is that you would guarantee that you are only executing the duplicates search once for each item which would reduce the total processing and extra output information to be parsed.  The other change to Gary’s script would be to change what is written out to the log file since you would only write out the information for items that are identified as duplicates. 

Bulk Updates of User Profile Properties

This past week fellow SharePoint MVP Yaroslav Pentsarskyy posted an excellent PowerShell script for doing bulk updates on the UserProfile properties via PowerShell.  The Bulk Update SharePoint 2010 User Profile Properties is a great script that makes it extremely easy to populate any new fields that are not set to synchronize. 

My team has been doing a lot of client work promoting the user of User Profiles for use within customizations or to drive business processes.  For a quick overview check out my blog post Permanent Link to User Profiles – Driving Business Process or sit in on my Developing Reusable Workflow Features presentation at SharePoint Saturday NY on July 30th or SharePoint Saturday The Conference 2011 August 11-13th.

This also demonstrates another great example of the value that PowerShell can bring to Building and Maintaining a high functioning SharePoint environment.

Quota Management and Storage Reports in SharePoint 2010

A few years ago I wrote an article about how to enable and work with the Quota Management features in SharePoint 2007 (click here for article) which proved to be a popular post.  Quota Management is a pretty important topic when it comes to SharePoint Governance and overall maintenance of the platform.  While the overall Quota Management features in SharePoint 2010 were maintained, there was one big feature left out when SharePoint first shipped, and that was the “Storage space allocation” page also known as StoreMon.aspx page that was available to Site Collection administrators from the Site Settings page.  

New Storage Metrics

With the release of SharePoint 2010 SP1 (download here) the feature returns, but in a much different format and vastly improved.  The page was renamed “Storage Metrics” and it is a gold mine of information since it provides a way for Administrators to navigate through the content locations on the site and provides details for the item’s Total Size, % of Parent, % of Site Quota, and Last Modified Date.  This makes it easy for administrators to identify where content is concentrated, and can also show an exceptionally large lists, libraries, folders, and documents. 

There was one aspect of this that I thought was helpful in 2007 that is no longer supported, and that is the ability to view the number of versions of a given document right from the report.  In many cases I’ve seen versioning turned without any limits, and some popular documents might have 1,000s of versions.  The report used to provide a way to find those exceptions so that they could be cleaned up.

 

Performance Improvements

From what I understand, it was removed because it proved to be extremely resource intensive and information was gathered in real-time so it could cause service stability issues in very large environments.  With its return is a completely revamped gathering process that relies on timer jobs, titled Storage Metrics Processing, resulting in much faster page loads and no risk of crashing the server just by viewing the report.  These jobs will pull data every 5 minutes but like all timer jobs, the frequency can be adjusted to better meet your needs and environment.  For larger environments, it might be a good idea to reduce that frequency to avoid the extra overhead.

Configuring Quotas

As with the 2007 version, this feature is only available if quotas are enabled.  In cases where quotas are not currently being used and proper limits managed, the safest bet is to establish a quota that cannot be met.  This will enable the features without the risk of triggering a warning or locking a site that exceeds the thresholds.  Locking the site is the only risk with quotas, there is no risk of data loss.

Summary

Both Farm and Site Collection Administrators should review the functionality and add its review into their content review and cleanup processes.

Shared Link: SharePoint 2010 Social Networking Diagram

I do not often link to another blog post as part of one on my site, but this one was too good to pass up.  For those of you interested in SharePoint 2010’s Social Features, here is  great diagram of of how all of the components and services fit together.  It is extremely valuable to understand this information when you embark on an implementation or need to troubleshoot why something is not behaving as expected.

SharePoint Solutions Team Blog

SharePoint 2010 Social Networking Diagram

SharePoint Log Viewer

Anyone that spends much time administering or developing against SharePoint knows what a pain it can be to work with the log files in a text editor.  The SharePoint LogViewer developed by Overroot provides a great interface making it easy to work with and filter down the log files.

Give it a try here:  http://sharepointlogviewer.codeplex.com/

Keys to Long Term SharePoint Stability and Success

Recently I have been called into a few environments where the customers were having some serious performance problems or had features that were no longer working.  It really nailed home the point that Capacity Planning should really be Capacity Management as Microsoft now refers to it in their Capacity Management and Sizing Overview guidance for SharePoint 2010.  These environments also tend to have some other issues with patching and large un-used content databases.

The Keys below will help establish long term success for your SharePoint environment.

Initial Design and Planning

The planning and design work that typically goes on prior to an implementation is based heavily on assumptions and the understanding of current requirements.  In any environment where an application like SharePoint takes off, those assumptions change quickly, the needs of the business evolves, and therefore all of those requirements change.  In many cases though, the SharePoint farm topology is not changed and can no longer meet the needs.  With the current state of IT many resources are stretched and do not have time to make major changes to the system, but in many cases a few proactive changes would remove some of the ongoing system support and troubleshooting efforts.

Continued Monitoring

Every system needs regular monitoring.  The frequency and depth of the reviews depends on how complicated the implementation is, but below I have listed out some generic topics that can be reviewed.

Quarterly Review

  • Memory and CPU Utilization
  • Patches – Review new patches and install if appropriate

Semi-Annual Review

  • Review Content Databases – Number of Site Collections per Content DB and the size of each Content Database
  • Search Index Health – Number of items in the index, length of the crawls
  • Average and Peak Usage Stats – Review the average and peak user stats and add hardware if needed.

In addition, in some cases new features are enabled or leveraged months after the initial implementation.  If for example you are going to use SharePoint to host your BI solutions additional capacity may be needed.  If the system was initially designed for a pretty basic Intranet and the BI capabilities are added then the system may not be able to keep up.

Patching

Patch management also contributes to keeping your SharePoint installation stable and high performing over time.  Installing Service Packs or the bi-monthly Cumulative Updates can be difficult in some environments where maintenance windows are tight, but these patches will also help keep services running smoothly and bugs at bay.  I worked in one environment where at least three major issues were all resolved with previously released patches.  Unfortunately a lot of time was spent troubleshooting needlessly.

Prune the Hedges

Most information stores get bloated over time, SharePoint is not immune to this.  IT groups have been fighting this for years with shared drives and mail servers.  It is important to have some good retention policies in place to make sure you are keeping the right content, but also getting rid of the stale content.  At the very least you can implement an archiving solution that can move the content to cheaper storage, while keeping it accessible.

Summary

Following these recommendations will greatly increase your chances for maintaining a highly capable, well performing environment.

User Profiles – Creating Custom Properties

The User Profiles in SharePoint Server represent a very robust and flexible way to manage information about the members of your organization.  It can be used to fill the roll of a searchable Employee Directory, used to drive business processes and workflows, and also makes it easier to find people in the organization based on their expertise and user property attributes providing social networking functionality. 

The default properties that are created at the time of installation are just a starting point.  In this article I will show you just how easy it is to create new properties that help support your organization and business processes. 

Planning The New Property

When defining new fields here is a selection of things to consider:

  • Name / Display Name
  • Type – Wide range of field types
  • Length – Cannot be modified in some situations
  • Configure a Term Store Set – Managed Meta Data
  • Policy Setting – Required, Optional, or Disabled
  • Privacy Settings – Field level privacy
  • Edit Settings – User maintained or administrator/system maintained
  • Display Settings – Show on View/Edit/Newsfeed
  • Search Settings – Support for a user Alias (i.e. Employee ID) and if it is Indexed
  • Profile Synchronization – You also have the ability to configure a synchronization with an external system (i.e. CRM, HRIS)

In many cases the options change based on the value of previous options.  A good example is based on the settings with the Type of string (Multi Value) or the Policy Setting.

Create Custom Property Walkthrough

Since I work in consulting, much of our content is very much client focused.  This is a great example of a property that would be very important to us, but not so important for the average company.  In my case, I want to allow consultants to add one or more customer names that they have worked with.  Since this valuable information could potentially be used for a number of purposes, (like tagging) throughout the entire SharePoint environment, I have decided to create a Managed Meta Data Term Set for this property so that we can reuse the content. 

Here is a quick shot of the Client List I created in the Term Store.

Define A Term Set

To create a new property, browse out to the User Profiles Service Application (or whatever your Profile Service App is named) and select the Manage User Properties link.

Manage User Properties

A full listing of the User Properties is displayed with properties organized into sections.  They can be ordered and placed into sections as needed.  To create a new property, simply click the New Property menu item.

New Property

Complete the main Property Settings.  In many cases changes to these settings cannot be made which means the previous property would have to be deleted and recreated.  In this case I created my Clients property and set it to a multi-value string separated by semicolons.  I then pointed it to the Client List Term Set previously configured.

Property Definition

The next set of fields control how the list is displayed and if it can be edited.  In this case, I want to make it an optional property and encourage consultants to maintain the value so I will enable it in each of the Display Settings.  It is not confidential information, so I will be sure to set the Privacy level to Everyone. 

Display and Policy Settings

Here is what the current profile looks like when rendered.  You can see that the Clients field is displayed and each value a link that feeds into the People Search.

Profile View

Once the values have been crawled and are available in the search index, you will start to see results in the people search process. 

People Search

Summary

By extending the User Profiles with custom properties you can leverage the robust platform to support an organization and its unique processes and content.

%d bloggers like this: