Saturday, November 15, 2008

Crawled Properties - Which property do they crawl?

If you've explored Search in Sharepoint, you would know that you can create 'Managed Property' which map to Crawled property. Have you ever wondered how the crawled property maps to real fields?

I still dont have the answer for that, but below post shares my painful experience due to lack of the answer. Read on.



As you know,
whenever you run full crawl, Sharepoint crawls entire content, identifies all properties, including your custom fields ( say 'BirthDate' ) and creates a crawled property for the field. All the OOB fields are mapped to crawled properties prefixed with 'ows_', while for custom fields, crawl property with the same name is added.



Our Requirement:
A simple one. Along with page title, bring in the page description in search results displayed. So our plan of action was

a) Create a Managed Property (mp_description) mapping to ows_description ( Description is the field name in Page content type)

b) Search Results Web Part: Configure the Select Columns xml to fetch mp_description.




  • So went about performing the steps and the results - Description was not showing up :-(

  • Explored the default xslt in Search Results web part and found that it isnt generic enough to pick up new select columns added. You need to manually modify it whenever you want to bring additional fields in results display. We had to correct these. Yet no success :-(

  • We thought may be content indexing wasnt good and so wiped off current index and reran the full crawl. Again no success.

  • Now we're sure frustrated. We picked up the MOSS Query Tool (an amazing tool for any developer customizing Search in MOSS) to see what's happening. All results came up with description field empty. Another surprise :-(

  • How about Google? Nope no postings on this.

At this point, it was clear either Search crawl isnt indexing description properly or Search results web part isnt doing its job. We ruled out the later as when experimented with other crawled properties, results were proper.


Now zero'ing on to search crawl, it struck us that "how can be sure that the crawled property we were referring is the one really pointing to Page description".Is there mapping in some config file maintained in 12 hive? A brute search doesnt yield anything.


Now people who have created custom content types would know that you can inherit from existing content types and for fields inherited you can change the display names in your content type. Going on this thought, we went to Page content type, clicked on 'Description' field and from the link got the GUID of the field. Went back to 12 ->Features ->fields, Publishing fields feature a brute search showed that the internal name of description field was indeed 'Comments' !!!!!!!!!!!!!!!! (Page content type derives from Item, which has this field).

Atlast. Remapped our managed property to ows_comments now and guess what, we got page description shown up in the results!!!

Great deception by MOSS, costing us quite number of days to resolve.

Suppressing Core.js for Authenticated Sites

Requirement:
Do not download core.js to reduce page size/load time. ( sounds a common requirement in web across all technologies !) .

Solution:
Well, the starting point was http://msdn.microsoft.com/en-us/library/bb727371.aspx?ppud=4
Good technique to put core.js to rest and save 255kB ( which was 25% of my page size. Huge savings). Below article is extension further to the msdn article.

Tricky part is ours was a Intranet site - NTLM authenticated and the article demonstrated quick fix for anonymous users and no solution for Authenticated Users. So we needed a customized solution to sit top of the msdn solution.

Two distinct set of users were there in our site a) Authors, Admins etc b) Employees

The pages would have SiteActions enabled for former set based on their security permissions and not shown for rest of the users.

Our pages were highly customized and we were sure we werent using any Sharepoint controls, OOB Web parts which need core.js except SiteActions. (Couple, I found to use core.js - Site Actions, Welcome Control - you could see these in press releases site of Publishing template)

Now after narrowing down to SiteActions to be only control requiring core.js, the need was to download/suppress core.js based on this control. Some options considered

Option 1: In the msdn solution, see if you can add few static rules, like all users who needs to be shown SiteActions button would belong to this group.

protected override void OnInit(EventArgs e)
{
if (HttpContext.Current.Request.IsAuthenticated && User-belongs-to-group(xyz)

)
{
Microsoft.SharePoint.WebControls.ScriptLink.RegisterCore(this.Page, true); } base.OnInit(e); }

Option 2: A generic approach to check user's permission set and and take a call on loading/suppressing core.js. However you would need to reconstruct the exact permission logic as SiteActions user control users to determine its visibility. This is scalable approach imposing no restrictions on security groups to be used by admin/authoring community.

We started out with this only to find SiteActions UserControl is a simple wrapper around MenuTemplate control, which was few layers deep. Deep Diving for few hours followed by googling for few minutes didnt help much and hence this option was shelved.

Option 3: OK. you want to load core.js only for SiteActions, then why not see if SiteActions is loaded and if so, load the core.js also.

This looked to be the best bet and we scripted the following code to aceive this

protected override void OnInit(EventArgs e)

{base.OnInit(e);if (IsSiteActionsMenuVisible())

{ScriptLink.RegisterCore(this.Page, true);}

}

bool IsSiteActionsMenuVisible()
{
//check if SiteActionsMenu is visible
try
{
Control ctl = FindControlRecursive(this.Page, "SiteActionsMenuMain");
if (ctl != null) return ctl.Visible;
return false;
}
catch (Exception e)
{
System.Diagnostics.Debug.WriteLine(e);
return false;
}
}


static Control FindControlRecursive(Control Root, string Id)
{
if (Root.ID == Id)
return Root;
foreach (Control Ctl in Root.Controls)
{
System.Diagnostics.Debug.WriteLine(Ctl.ID + ":" + Ctl.ClientID);
Control FoundCtl = FindControlRecursive(Ctl, Id);
if (FoundCtl != null)
return FoundCtl;
}
return null;
}

and well it worked. Great to see the page reduce by 25%!!!

Additional Option - more of an after thought was to extend the SiteActions control and make it register core.js. I like this kind of design - a simple philosophy of You-need-it-You-register-core.js. When I experiment that would publish my findings in the blog.


(BTW I couldnt dig out any documentation on controls using core.js ( same as with init.js). Only methodology was for us to knock of core.js in page and see if anything is breaking up. Any pointers on the same?)

Note: Condition to check SiteActionsMenu can be extended to check for other controls ( e.g Welcome Control) also and suppress core.js.

Sunday, November 9, 2008

Double hop resolution - the SharePoint way

Is there anything special about Double Hop issue in SharePoint site? Answer is no. Its the same old classic issue of IIS Website, with impersonation set to True issuing calls to different server, loosing the caller identity.

Classic resolution was to use RevertToSelf API ( its an unmanaged Win32 API). Turns off impersonation, Thread switches over to App pool account, does the work, once completed switches back to user account. Neat job done and not much frills.

Revertoself solution

[DllImport("advapi32.dll")]

static extern bool RevertToSelf();


WindowsIdentity endUser = WindowsIdentity.GetCurrent();
RevertToSelf();


// NOW THREAD IS RUNNING IN APP POOL ACCOUNT. Do whatever you need


WindowsImpersonationContext objContext = endUser.Impersonate();

The same could be used in your sharepoint code, however couple of not-so-comfy-things about this approach
- Usage of Win32 APIs in your code and (referring the dlls)
- Changing to ThreadAccount and back to user - anything can happen in between and you need careful with your exception handling to restore the thread identity. Any miss here, you rest of the code is going to run with elevated privileges.

Alternate option? SPElevated Privileges. As name indicates it changes SharePoint context for higher privileges. And behind the scenes, the API also does thread account swithcing to app pool account. We can leverage this fact for our problem.

The revised code would look like this. No unmanaged code, ensured revert whatever happens. Safe and sound approach.

SPSecurity.RunWithElevatedPrivileges(delegate()


{ // DO YOUR WORK HERE


}


);




As stated above, however note that SPElevated Privileges does more than just thread context switching. If in the block you have any Sharepoint related code, such as SPSite, SPWeb creation etc, they would be done with higher privileges which you may not want. Hence restrict the Using block size to only the statements which would require impersonation to be turned off temporarily.